Somatic hypermutation systems

ABSTRACT

The present application relates to somatic hypermutation (SHM) systems and synthetic genes. Synthetic genes can be designed using computer-based approaches to increase or decrease susceptibility of a polynucleotide to somatic hypermutation. Genes of interest are inserted into the vectors and subjected to activation-induced cytidine deaminase to induce somatic hypermutation. Proteins or portions thereof encoded by the modified genes can be introduced into a SHM system for somatic hypermutation and proteins or portions thereof exhibiting a desired phenotype or function can be isolated for in vitro or in vivo diagnostic or therapeutic uses.

CROSS-REFERENCE

This application claims the benefit of U.S. Provisional Application No.60/902,414 (Attorney docket no. 33547-705.101), filed Feb. 20, 2007,U.S. Provisional Application No. 60/904,622 (Attorney docket no.33547-706.101), filed Mar. 1, 2007, U.S. Provisional Application No.61/020,124 (Attorney docket no. 33547-706.102), filed Jan. 9, 2008, andU.S. Provisional Application No. 60/995,970 (Attorney docket no.33547-708.101), filed Sep. 28, 2007, each of which applications isincorporated herein by reference in its entirety.

BACKGROUND OF THE INVENTION

The market for the use of recombinant protein therapeutics has increasedsteadily for the last quarter century. In 2005, six of the top 20 drugswere proteins, and overall, biopharmaceutical drugs accounted forrevenues of approximately $40 billion, of which approximately $17billion was based on the sales of monoclonal antibodies.

Monoclonal antibodies represent a distinct class of biotherapeutics witha great deal of promise. The antibody scaffold is well tolerated in theclinic, and glycosylated IgG molecules have favorable pharmacokineticand pharmacodynamic properties. There is the potential for rapidlyselecting new drug candidates that vary little from currently marketeddrugs. Issues relating to non-mechanism based toxicity, and themanufacturing and formulation of antibody products are known andconsistent across the therapeutic group, which reduces the potentialfailure rate associated with this class of drug candidates as comparedto small molecule therapeutics.

In contrast to traditional small molecule based approaches, therapeuticantibodies have significant advantages, including the facts that (i)they can be generated and validated quickly; (ii) they exhibit fewerside effects and have improved safety profiles, (iii) they have wellunderstood pharmacokinetic characteristics, and can be optimized tocreate long half-life products with reduced dosing frequency; iv) theyare versatile and exhibit flexibility in drug function; v) scale-up andmanufacturing processes are robust and well-understood; vi) they have aproven track record of clinical and regulatory success; and vii) thecurrent regulatory environment makes the introduction of competitivegeneric, or bioequivalent, antibodies both difficult and costly.

Even given the success of monoclonal antibodies, the antibody-as-drugmodality is continuing to evolve, and subject to inefficiency. Further,intrinsic biological bias within the native immune system often worksagainst the more rapid development of improved therapeutics. Theselimitations include, i) the long development time for the isolation ofbiologically active antibodies with affinity constants of therapeuticcaliber, ii) the inability to raise antibodies to certain classes ofprotein targets (intractable targets), and iii) the intrinsic affinityceiling inherent in immune system based affinity selection.

Specifically there is a need for methods to more rapidly developantibodies with improved pharmacokinetics, cross-reactivity, safetyprofiles and superior dosing regimens. Central to this need is thedevelopment of methods that enable the systematic analysis of potentialepitopes with a protein, and enable the selective development ofantibodies with the desired selectivity profiles.

There are several existing well-established methods of developingmonoclonal antibodies; however, many of these technologies have specificdisadvantages that limit their ability to rapidly evolve the bestclinical candidates. These technological limitations include: i) mouseimmunization and hybridoma technology cannot be used iteratively andoften fails to yield an antibody with desired characteristics due toantigen intractability; ii) phage display or panning often fails toyield monoclonal antibodies with affinity constants of therapeuticcaliber, and cannot easily be used to select and co-evolve entire heavyand light chains; and iii) rational design strategies often provides anincomplete solution, and are based solely on existing knowledge.

An approach used includes the use of random or semi-random mutagenesis(for example the use of error prone PCR), in conjunction with in vitromolecular evolution. This approach is based on the creation of randomchanges in protein structure and the generation of large libraries ofmutant polynucleotides that are subsequently screened for improvedvariants, usually through the expression of the encoded proteins withina living cell. From these libraries, a few improved proteins can beselected for further optimization.

Such in vitro mutation approaches can be limited by the inability tosystematically search a portion of any given sequence, and by therelative difficulty of detecting very rare improvement mutants out of alarge number of mutations. This fundamental problem arises because thetotal number of possible mutants for a reasonably sized protein islarge. For example, a 100 amino acid protein has a potential diversityof 20¹⁰⁰ different sequences of amino acids, while existing highthroughput screening methodologies are, in some cases, limited to amaximum screening capacity of 10⁷-10⁸ samples per week. Additionally,such approaches are relatively inefficient because of redundant codonusage, in which up to around 3¹⁰⁰ of the nucleotide sequences possiblefor a 100 amino acid residue protein actually encode for the same aminoacids and protein, (Gustafsson et al. (2004) Codon Bias and heterologousprotein expression Trends. Biotech. 22 (7) 346-353).

A more sophisticated approach uses a mixture of random mutagenesis withrecombination between protein domains in order to select for improvedproteins (Stemmer Proc. Natl. Acad. Sci. (1994) 91 (22) 10747-51). Thisapproach exploits natural design concepts inherent in protein structuresacross families of proteins, but again requires significant recombinantDNA manipulation and screening capacity of a large number of sequencesto identify rare improvements. Both approaches require extensivefollow-up mutagenesis and analysis to understand the significance ofeach mutation, and to identify the best combination of the manythousands or millions of mutants identified.

SUMMARY OF THE INVENTION

The present invention is based on the development of a system to designand make or generate SHM susceptible and SHM resistant DNA sequences,within a cell or cell-free, environment. The present invention isfurther based on the development of a SHM system that is stable over asuitable time period to reproducibly maintain increased and/or decreasedrates of SHM without affecting structural portions or polypeptides orstructural proteins, transcriptional control regions and selectablemarkers. The system allows for stable maintenance of a mutagenesissystem that provides for high level targeted SHM in a polynucleotide ofinterest, while sufficiently preventing non-specific mutagenesis ofstructural proteins, transcriptional control regions and selectablemarkers.

In part, the present system is based upon the creation of a more stableversion of activation induced cytidine deaminase (AID) that can providefor high level sustained SHM.

In the present application, in vitro somatic hypermutation (SHM) systemsinvolve the use of in vitro SHM in conjunction with directed evolutionand bioinformatic analysis to create integrated systems that include,but are not limited to, optimized, controlled systems for codon designand usage, library design, screening, selection and integrated systemsfor the data mining. These systems include:

I. An expression system designed to create SHM susceptible and or SHMresistant DNA sequences, within a cell or cell-free, environment. Thesystem enables the stable maintenance of a mutagenesis system thatprovides for high level targeted SHM in a polynucleotide of interest,while significantly preventing non-specific mutagenesis of structuralproteins, transcriptional control regions and selectable markers.

II. Polynucleotide libraries that are focused in size and specificity,and are enriched for those functions of interest and are efficientsubstrates for SHM to seed in situ diversity creation upon exposure toAID.

III. A process based on computational analysis of protein structure,intra-species and inter-species sequence variation, and the functionalanalysis of protein activity for selecting optimal epitopes that providefor the selection of antibodies with superior selectivity, cross speciesreactivity, and blocking activity.

Provided herein is a method to design polynucleotide sequences to eithermaximize or minimize the tendency of a polynucleotide to undergo SHM,while at the same time maximizing protein expression, RNA stability, andthe presence of conveniently located restriction enzyme sites.

Also provided herein are synthetic or semi-synthetic versions of apolynucleotide that are optimized to either enhance, or decrease theimpact of SHM on the rate of mutagenesis of that polynucleotide comparedto its wild type's susceptibility to undergo SHM (i.e., SHM susceptibleor SHM resistent).

SHM susceptible sequences enable the rapid evolution of polynucleotideswhich are designed based on codon usage to be more susceptible to SHM;optimized polynucleotides can be exposed to AID and expressed aspolypeptides. Conversely, SHM resistant sequences enable the rapidevolution of polynucleotides which are designed based on codon usage tobe less susceptible (resistant) to SHM; optimized polynucleotides can beexposed to AID and expressed as polypeptides. Modified versions of theencoded polypeptides can be selected for improved function or increasedstability and resistance to SHM.

The system described herein combines the power of rational design withaccelerated random mutagenesis and directed evolution.

Also included in the invention are SHM resistant polynucleotidesequences that enable important conserved domains to be spared frommutagenesis, while simultaneously targeting desired sequences.

Polynucleotides for which these methods are applicable include anypolynucleotide sequence that can be transcribed and a functionalactivity devised for screening.

The overall result of the integration of these approaches is anintegrated system for creating targeted diversity in situ, and for theautomated analysis and selection of proteins with improved traits.

In certain embodiments, the present invention is based in part on animproved understanding of the context of multiple rounds of SHM withinthe reading frame of a polynucleotide sequence, and the underlying logicrelationships of codon usage patterns.

Provided herein is a SHM susceptible synthetic gene encoding a protein,or a portion thereof, wherein one or more first SHM motifs in anunmodified polynucleotide sequence encoding said protein or portionthereof has been replaced by one or more second SHM motifs having ahigher probability of SHM, wherein said SHM susceptible synthetic geneexhibits a higher rate of activation induced cytidine deaminase(AID)-mediated mutagenesis compared to said unmodified polynucleotidesequence. Preferred hot spot SHM codons or preferred hot spot SHM motifsare for, example, a codon including, but not limited to codons AAC, TAC,TAT, AGT and AGC. Such sequences may be potentially embedded within thecontext of a larger SHM motif, recruit SHM mediated mutagenesis andgenerate targeted amino acid diversity at that codon.

The present invention also contemplates that a SHM susceptible syntheticgene can be created in a step-wise or sequential fashion such that somemodifications are made to the gene and then a subsequent round ofmodification is made to the gene. Such sequential or step-wisemodifications are contemplated by the present invention and are one wayof carrying out the process and one way of producing the genes claimedherein.

In one embodiment, the SHM susceptible synthetic gene encodes a proteinor portion thereof having about 99%, about 95%, about 90% amino, about85%, about 80%, about 75%, about 70%, about 65%, about 60%, about 55%,about 50%, or any percentage between about 50% and about 100% identityto an unmodified gene.

In one embodiment, the SHM susceptible synthetic gene exhibits a higherrate of AID-mediated mutagenesis including, but not limited to,1.05-fold, 1.1-fold, 1.25-fold, 1.5-fold, 2-fold, 5-fold, 10-fold,50-fold, 100-fold, 200-fold, 500-fold, 1000-fold or more, or any rangetherebetween.

In one embodiment, the SHM susceptible synthetic gene exhibits a rate ofAID-mediated mutagenesis at a level which is at least about 101%, atleast about 105%, at least about 110%, at least about 115%, at leastabout 120%, at least about 125%, at least about 130%, at least about135%, at least about 1140%, at least about 145%, at least about 150%, atleast about 200%, at least about 250%, at least about 300%, at leastabout 350%, at least about 400%, at least about 450%, at least about5000%, or higher of that exhibited by an unmodified gene.

In one embodiment, provided herein is a SHM susceptible gene encoding aprotein or a portion thereof wherein one or more first SHM motifs in anunmodified polynucleotide sequence encoding said protein or portionthereof has been replaced by one or more second SHM motifs having ahigher probability of SHM, said synthetic gene having a greater densityof hot spot motifs than said unmodified polynucleotide sequence.

In yet another non-limiting aspect, the said synthetic gene includes oneor more amino acid mutations that introduce preferred SHM hot spotmotifs.

Provided herein is a SHM resistant synthetic gene encoding a protein, ora portion thereof, wherein one or more first SHM motifs in an unmodifiedpolynucleotide sequence encoding said protein or portion thereof hasbeen replaced by one or more second SHM motifs having a lowerprobability of SHM, wherein said SHM resistant synthetic gene exhibits alower rate of AID-mediated mutagenesis compared to said unmodifiedpolynucleotide sequence.

The present invention also contemplates that a SHM resistant syntheticgene can be created in a step-wise or sequential fashion such that somemodifications are made to the gene and then a subsequent round ofmodification is made to the gene. Such sequential or step-wisemodifications are contemplated by the present invention and are one wayof carrying out the process and one way of producing the genes claimedherein.

In one embodiment, the SHM resistant synthetic gene encodes a protein orportion thereof having about 95%, about 90% amino, about 85%, about 80%,about 75%, about 70%, about 65%, about 60%, about 55%, about 50%, or anypercentage between about 50% and about 100% identity to an unmodifiedgene.

In one embodiment, the SHM resistant synthetic gene exhibits a lowerrate of AID-mediated mutagenesis including, but not limited to,1.05-fold, 1.1-fold, 1.2-fold, 1.5-fold, 2-fold, 5-fold, 10-fold,50-fold, 100-fold, 200-fold, 500-fold, 1000-fold or less, or any rangetherebetween.

In one embodiment, the SHM resistant synthetic gene exhibiting a rate ofAID-mediated mutagenesis at a level which is less than about 99%, lessthan about 95%, less than about 90%, less than about 85%, less thanabout 80%, less than about 75%, less than about 70%, less than about65%, less than about 60%, less than about 55%, or less than about 50%,of that exhibited by an unmodified gene.

In another embodiment, provided herein is a SHM resistant synthetic geneencoding a protein or a portion thereof wherein one or more first SHMmotifs in an unmodified polynucleotide sequence encoding said protein orportion thereof has been replaced by one or more second SHM motifshaving a lower probability of SHM, said synthetic gene having a greaterdensity of cold spots than said unmodified polynucleotide sequence.

Provided herein is a selectively targeted, SHM optimized synthetic geneencoding a protein, or a portion thereof, wherein one or more first SHMmotifs in an unmodified polynucleotide sequence encoding said protein orportion thereof has been replaced by one or more second SHM motifshaving a higher probability of SHM; and one or more third SHM motifs insaid unmodified polynucleotide sequence encoding said protein or portionthereof has been replaced by one or more fourth SHM motifs having alower probability of SHM; wherein said selectively targeted, SHMoptimized synthetic gene exhibits targeted AID-mediated mutagenesis. Insuch an embodiment, the selectively targeted, SHM optimized syntheticgene has portions that exhibit a higher rate of MD-mediated mutagenesisand other portions that exhibit a lower rate of AID-mediatedmutagenesis.

In yet another embodiment, provided herein is a selectively targeted,SHM optimized synthetic gene encoding a protein or portion thereof,wherein one or more first SHM motifs in an unmodified polynucleotidesequence encoding said protein or portion thereof has been replaced byone or more second SHM motifs having a higher probability of SHM, saidsynthetic gene having a greater density of hot spot motifs than saidunmodified polynucleotide sequence; and one or more third SHM motifs insaid unmodified polynucleotide sequence encoding said protein or portionthereof has been replaced by one or more fourth SHM motifs having alower probability of SHM, said synthetic gene having a greater densityof cold spots than said unmodified polynucleotide sequence; wherein saidselectively targeted, SHM optimized synthetic gene exhibits targetedAID-mediated mutagenesis. In one embodiment, the selectively targeted,SHM optimized synthetic gene has portions that exhibit a higher rate ofAID-mediated mutagenesis and other portions that exhibit a lower rate ofAID-mediated mutagenesis.

The present invention also contemplates that a selectively targeted SHMoptimized synthetic gene can be created in a step-wise or sequentialfashion such that some modifications are made to the gene and then asubsequent round of modification is made to the gene. Such sequential orstep-wise modifications are contemplated by the present invention andare one way of carrying out the process and one way of producing thegenes claimed herein.

In one non-limiting aspect, the synthetic gene includes one or moreconservative or semi-conservative amino acid mutations to modulate hotspot or cold spot density and said synthetic gene encodes a protein orportion thereof having about 90% or greater amino acid sequence identitycompared to said unmodified gene.

In another non-limiting aspect, the synthetic gene includes one or moreconservative or semi-conservative amino acid mutations to modulate hotspot or cold spot density, and said synthetic gene encodes a protein orportion thereof having about 70% or greater amino acid sequence identitycompared to said unmodified gene.

In another non-limiting aspect, the synthetic gene includes one or moreconservative or semi-conservative amino acid mutations to modulate hotspot or cold spot density, and said synthetic gene encodes a protein orportion thereof having about 50% or greater amino acid sequence identitycompared to said unmodified gene.

In yet another non-limiting aspect, the said synthetic gene includes oneor more amino acid mutations that introduce preferred SHM hot spotmotifs.

In one non-limiting example, a synthetic gene does not include genescomprising a stop motif inserted into an open reading frame.

In one aspect, a protein or portion thereof encoded by a synthetic geneis selected from among, but not limited to, antibodies orantigen-binding fragments thereof, selectable markers, reporterproteins, cytokines, chemokines, growth factors, hormones,neurotransmitters, hormones, cytokines, chemokines, enzymes, receptors,structural proteins, toxins, co-factors and transcription factors.

Provided herein is an expression vector, comprising at least onesynthetic gene. In one aspect, the expression vector is an integratingexpression vector. When the expression vector is an integratingexpression vector, the expression vector can further comprise one ormore sequences to direct recombination. In another aspect, theexpression vector is an episomal expression vector. In yet anotheraspect, the expression vector is a viral expression vector.

Provided herein is a eukaryotic cell comprising a synthetic gene asdescribed herein. In one aspect, the eukaryotic cell is a mammalian cellor a yeast cell.

Provided herein is a prokaryotic cell comprising a synthetic gene asdescribed herein. In one aspect, the prokaryotic cell is an Escherichiacoli cell.

Provided herein is a method for preparing a gene product having adesired property, comprising: a) preparing a synthetic gene encoding agene product which exhibits increased SHM; b) expressing said syntheticgene in a population of cells; wherein said population of cells expressAID, or can be induced to express AID via the addition of an inducingagent; and c) selecting a cell or cells within the population of cellswhich express a mutated gene product having the desired property. In oneaspect, the method, optionally, further comprises activating or inducingthe expressing AID in said population of cells. In another aspect, themethod, optionally, further comprises establishing one or more clonalpopulations of cells from the cell or cells identified in (c). In yetanother aspect of the method, at least one synthetic gene is located inan expression vector such as any one of the vectors described elsewhereherein. In one aspect of the method, the cell is a cell as describedelsewhere herein.

Provided herein is a method for preparing a gene product having adesired property, comprising: a) expressing said gene product in apopulation of cells; wherein said population of cells comprises at leastone synthetic gene which exhibits decreased SHM; and wherein saidpopulation of cells express an AID, or can be induced to express AID viaaddition of an inducing agent; and b) selecting a cell or cells withinthe population of cells which express a mutated gene product (apolypeptide encoded by the mutated synthetic gene, the gene having oneor more mutations) having the desired property. In one aspect, themethod, optionally, further comprises activating or inducing theexpressing AID in said population of cells. In another aspect, themethod, optionally, further comprises establishing one or more clonalpopulations of cells from the cell or cells identified in (b). In yetanother aspect of the method, at least one synthetic gene is located inan expression vector such as any one of the vectors described elsewhereherein. In one aspect of the method, the cell is a cell as describedelsewhere herein.

Provided herein is an in vitro hypermutation system, comprising: a) apolynucleotide comprising a synthetic gene; b) a recombinant AID; and c)an in vitro expression system. The in vitro system can further comprisea polymerase to amplify nucleic acids after transcription. The in vitrosystem can further comprise an in vitro translation system. In oneaspect, the polynucleotide is located in an expression vector such asany one of the vectors described elsewhere herein. The in vitro systemcan further comprise a cell population of a cell as described elsewhereherein.

Provided herein is a kit for in vitro mutagenesis, comprising: a) arecombinant AID protein; b) one or more reagents for in vitrotranscription; and c) instructions for design or use of a syntheticgene. The kit can further comprise one or more reagents for in vitrotranslation. The kit can further comprise comprising an expressionvector such as, for example, any one of the expression vectors asdescribed herein. The kit can further comprise a cell population of acell as described elsewhere herein.

INCORPORATION BY REFERENCE

All publications and patent applications mentioned in this specificationare herein incorporated by reference to the same extent as if eachindividual publication or patent application was specifically andindividually indicated to be incorporated by reference.

BRIEF DESCRIPTION OF THE DRAWINGS

A better understanding of the features and advantages of the presentinvention can be obtained by reference to the following detaileddescription that sets forth illustrative embodiments, in which theprinciples of the invention are utilized, and the accompanying drawingsof which:

FIG. 1 and FIG. 2 show the 20 most common codon transitions observed incomplementarity determining regions (CDRs) and framework regions (FWs orFRs) during SHM-mediated affinity maturation and demonstrate how simpleframe shifts can determine the two radically different patterns ofmutagenesis seen in CDRs and FWs. These observations lead directly to ahypothesis that both functional selection during affinity maturation andthe reading frame context determines the amino acid diversity generatedat SHM hot spot codons.

FIG. 1 Shows that within CDRs, (the codons AGC, TAT, and TAC whichencode tyrosine and serine amino acids), feed a directed flow ofprimary, secondary and tertiary SHM events generating amino aciddiversity. Within CDRs, the most common codon transition observed is AGCto AAC (785 instances), leading to a serine to asparagine conversion.While that transition is also common in framework regions (354instances), a simple frame shift of the same mutation in the samehotspot motif ( . . . TACAGCTAT . . . ; SEQ ID NO: 42) context leads toa CAG to CAA silent mutation that is common in framework regions (288instances) but not commonly observed in CDRs.

FIG. 2 Shows that in contrast to FIG. 1, the most commonly observedcodon (amino acid) transition events in frame work regions generatesilent mutations.

FIG. 3 Provides all of the motifs encoded by the WAC library, given inany of the three possible reading frames, produce a concatenation of hotspots and compares these motifs with all other possible 4096 6-mernucleotide combinations for their ability to recruit SHM-mediatedmachinery. Longer assemblies result in the same high density of SHM “hotspots” with no “cold spots.” This assembly of degenerate codons (WACW)results in a subset of possible 4-mer hot spots described by Rogozin etal. (WRCH), where R=A or G, H=A or C or T, and W=T or A.

FIG. 4 Preferred SHM hot spot codons AAC and TAC, which are the basisfor a synthetic library, can result in a set of primary and secondarymutation events that create considerable amino acid diversity, as judgedby equivalent SHM mutation events observed in Ig heavy chainsantibodies. From these two codons, basic amino acids (histidine, lysine,arginine), an acidic amino acid (Aspartate), hydrophilic amino acids(serine, threonine, asparagine, tyrosine), hydrophobic amino acids(Alanine, and phenylalanine), and glycine are generated as a result ofSHM events.

FIG. 5 The distribution of all 4096 6-mer nucleotide z-scores describingthe hotness or coldness of the motif to SHM-mediated mutation. Thez-scores for all permutations of 6-mers in the WRC synthetic library aresuperimposed on this distribution, with the dashed line denoting the top5% of all possible motifs.

FIG. 6 The series of mutation events that lead to the creation of aminoacid diversity, starting from “preferred SHM hot spot codons” AGC andTAC, as observed in affinity matured IGV heavy chain sequences. 4200primary and secondary mutation events, starting from codons encodingasparagine and tyrosine, lead to a set of functionally diverse aminoacids.

FIG. 7 Illustrates the convergence of sequence optimization withprogressive iterations of replacement using the program SHMredesign. Thefigure shows both optimization towards an idealized hot and coldsequence, in this case starting with unmodified canine AID nucleotidesequence.

FIG. 8 Provides the amino acid (A; SEQ ID NO: 294), and polynucleotidesequence (B; SEQ ID NO: 26) of unmodified blasticidin gene. Also shownis the initial analysis of hot spots (C), cold spots (D) and occurrencesof CpGs (E) as illustrated by bold capital letters.

FIG. 9 Provides the amino acid (A; SEQ ID NO: 294), and polynucleotidesequence (B; SEQ ID NO: 295) of a synthetic, SHM resistant version ofthe blasticidin gene. Also shown is the analysis of hot spots (C), coldspots (D) and occurrences of CpGs (E) in the synthetic sequence asillustrated by bold capital letters.

FIG. 10 Provides the amino acid (A; SEQ ID NO: 294), and polynucleotidesequence (B; SEQ ID NO: 296) of a synthetic, SHM susceptible version ofthe blasticidin gene. Also shown is the analysis of hot spots (C), coldspots (D) and occurrences of CpGs (E) in the synthetic sequence asillustrated by bold capital letters.

FIG. 11 Provides the polynucleotide sequence (A; SEQ ID NO: 297) of theunmodified form of the hygromycin gene. Also shown is the initialanalysis of hot spots (B) and cold spots (C) as illustrated by boldcapital letters. 102 CpG methylation sites were present (data notshown).

FIG. 12 Provides the polynucleotide sequence (A; SEQ ID NO: 298) of asynthetic, SHM resistant (cold) form of the hygromycin gene. Also shownis the analysis of hot spots (B) and cold spots (C) as illustrated bybold capital letters. 41 CpG methylation sites were present (data notshown).

FIG. 13 Provides the polynucleotide sequence (A; SEQ ID NO: 299) of asynthetic, SHM susceptible (hot) form of the hygromycin gene. Also shownis the analysis of hot spots (B) and cold spots (C) as illustrated bybold capital letters. 32 CpG methylation sites were present (data notshown).

FIG. 14 Provides the polynucleotide sequence (A; SEQ ID NO: 300) of aunmodified form of the Teal Fluorescent Protein (TFP). Also shown is theanalysis of hot spots (B) and cold spots (C) as illustrated by boldcapital letters. 40 CpG methylation sites were present (data not shown).

FIG. 15 Provides the polynucleotide sequence (A; SEQ ID NO: 301) of asynthetic SHM susceptible (hot) form of the Teal Fluorescent Protein(TFP). Also shown is the analysis of hot spots (B) and cold spots (C) asillustrated by bold capital letters. 14 CpG methylation sites werepresent (data not shown).

FIG. 16 Provides the polynucleotide sequence (A; SEQ ID NO: 302) of asynthetic SHM resistant (cold) form of the Teal Fluorescent Protein(TFP). Also shown is the analysis of hot spots (B) and cold spots (C) asillustrated by bold capital letters. 21 CpG methylation sites werepresent (data not shown).

FIG. 16D shows the mutations for a representative segment of the hot andcold TFP constructs. The central row shows the amino acid sequence ofTFP (residues 59 thru 87) in single letter format (SEQ ID NO: 378), andthe “hot” and “cold” starting nucleic acid sequences encoding the twoconstructs are shown above (hot; SEQ ID NO: 379) and below (cold) theamino acid sequence (SEQ ID NO: 380). Mutations observed in the hotsequence are aligned and stacked top of the gene sequences, whilemutations in the cold TFP sequence are shown below. The resultsillustrate how “silent” changes to the coding sequences generatedramatic changes in observed AID-mediated SHM rates, demonstrating thatengineered sequences can be effectively optimized to create fast or slowrates of SHM.

FIG. 16E shows that the spectrum of mutations generated by AID in thepresent in vitro tissue culture system mirror those observed in otherstudies and those seen during in vivo affinity maturation. FIG. 16Eshows the mutations generated in the present study (Box (i) upper left,n=118), and compares them with mutations observed by Zan et al. (box(ii) upper right, n=702), Wilson et al. (lower left, n=25000; box(iii)), and a larger analysis of IGHV chains that have undergoneaffinity maturation (lower right, n=101,926; box (iv)). The Y-axis ineach chart indicates the starting nucleotide, the X-axis indicates theend nucleotide, and the number in each square indicates the percentage(%) of time that nucleotide transition is observed. In the presentstudy, the frequency of mutation transitions and transversions wassimilar to those seen in other data sets. Mutations of C to T and G to Aare the direct result of AID activity on cytidines and account for 48%of all mutation events. In addition, mutations at bases A and T accountfor ˜30% of mutation events (i.e., slightly less than frequenciesobserved in other datasets).

FIG. 16F shows that mutation events are distributed throughout the SHMoptimized nucleotide sequence of the hot TFP gene, with a maximuminstantaneous rate of about 0.08 events per 1000 nucleotides pergeneration centered around 300 nucleotides from the beginning of theopen reading frame. Stable transfection and selection of a gene with AID(for 30 days) produces a maximum rate of mutation of 1 event per 480nucleotides. As a result, genes may contain zero, one, two or moremutations per gene.

FIG. 16G Illustrates the distribution of SHM-mediated events observed inhot TFP sequenced genes compared to the significantly reduced pattern ofmutations seen in cold TFP (FIG. 16H).

FIG. 17 Provides a sequence comparison of activation-induced cytidinedeaminase (AID) from Homo sapiens (human; SEQ ID NO: 303), Mus musculus(mouse; SEQ ID NO: 304), Canis familiaris (dog; SEQ ID NO: 305), Rattusnorv (rat; SEQ ID NO: 306) and Pan troglodyte (chimpanzee; SEQ ID NO:307). Variations between the species are represented by bold aminoacids.

FIG. 18 Provides the amino acid (A; SEQ ID NO: 308), and polynucleotidesequence (B; SEQ ID NO: 309) of canine cytidine deaminase (AID) (L198A)Also shown is the analysis of hot spots (C), cold spots (D) andoccurrences of CpGs (E) in the unmodified sequence as illustrated bybold capital letters.

FIG. 19 Provides the polynucleotide sequence (A; SEQ ID NO: 310) of asynthetic SHM susceptible form of canine AID. Also shown is the analysisof hot spots (B), cold spots (C) and occurrences of CpGs (D) asillustrated by bold capital letters.

FIG. 20 Provides the polynucleotide sequence (A; SEQ ID NO: 311) of asynthetic SHM resistant form of canine AID. Also shown is the analysisof hot spots (B), cold spots (C) and occurrences of CpGs (D) asillustrated by bold capital letters.

FIG. 21 Provides a sequence comparison of genomic Canis familiaris (dog;SEQ ID NO: 312) and SHM-resistant (cold) Canis familiaris (dog; SEQ IDNO: 313), Homo sapiens (human; SEQ ID NO: 314) and Mus musculus (mouse;SEQ ID NO: 315) mRNA activation-induced cytidine deaminase (AID)sequences. GAG sequences are illustrated by bold, underlining.Variations between the species are represented by bold amino acids.

FIG. 22 FIG. 22A Shows the predicted effect of AID activity on reversionfrequency using a protein containing a mutable stop codon such as afluorescent protein. FIG. 22B shows the actual rates of loss offluorescence achieved (shown as GFP extinction) with cells transfectedwith two different concentrations of an expression vector capable ofexpressing AID, and stably expressing GFP. FIG. 22C shows the initialrates of GFP reversion with comparing directly, wild type human AID, andcold canine AID. Also shown is the effect of Ig enhancers on reversionrate.

FIG. 23 Provides the amino acid (A; SEQ ID NO: 316), and polynucleotidesequence (B; SEQ ID NO: 317) of unmodified Pol eta as well as theanalysis of the number of hot spots, cold spots and CpGs (C).

FIG. 24 Provides the polynucleotide sequence (A; SEQ ID NO: 318) of asynthetic SHM resistant (cold) form of Pol eta well as the analysis ofthe number of hot spots, cold spots and CpGs (B).

FIG. 25 Provides the polynucleotide sequence (A; SEQ ID NO: 319) of asynthetic SHM susceptible (hot) form of Pol eta well as the analysis ofthe number of hot spots, cold spots and CpGs (B).

FIG. 26 Provides the amino acid sequence of unmodified Pol theta (SEQ IDNO: 320).

FIG. 27 Provides the polynucleotide sequence of unmodified Pol theta(SEQ ID NO: 321).

FIG. 28 Provides the polynucleotide sequence of a synthetic SHMresistant (cold) form of Pol theta (SEQ ID NO: 322).

FIG. 29 Provide the polynucleotide sequence of a synthetic SHMsusceptible (hot) form of Pol theta (SEQ ID NO: 323).

FIG. 30 Provides the amino acid (A; SEQ ID NO: 324), and polynucleotidesequence (B; SEQ ID NO: 325) of unmodified uracil DNA glycosylase wellas the analysis of the number of hot spots, cold spots and CpGs (C).

FIG. 31 Provides the polynucleotide sequence of a synthetic SHMresistant (cold) form of uracil DNA glycosylase (A; SEQ ID NO: 326) anda synthetic SHM susceptible (hot) form of uracil glycosylase (B; SEQ IDNO: 327).

FIG. 32 Provides a schematic of Vector Formats 1 (A) & 2 (B).

FIG. 33 Provides a schematic of Vector Format 3 (A) & 4 (B).

FIG. 34 Provides a schematic of Vector Format 5.

FIG. 35 Provides a schematic of Vectors F1 (A) and F7 (B).

FIG. 36 Provides schematics of Vector AB102. Restriction sites andgenetic elements are illustrated in 36A and fragments used inconstruction of the vector are illustrated in 36B.

FIG. 37 Provides a schematic of Vectors AB184 (A) and F10 (B).

FIG. 38 Provides a schematic of Vector ANA209.

FIG. 39 Provides the polynucleotide sequence of unmodified NisB (A; SEQID NO: 328) and the initial analysis of hot spots, cold spots and CpGcontent (B).

FIG. 40 Provides the polynucleotide sequence of a SHM resistant (cold)form of NisB (A; SEQ ID NO: 329) and the initial analysis of hot spots,cold spots and CpG content (B).

FIG. 41 Provides the polynucleotide sequence of unmodified NisP (A; SEQID NO: 330) and the initial analysis of hot spots, cold spots and CpGcontent (B).

FIG. 42 Provides the polynucleotide sequence of a SHM resistant (cold)form of NisP (A; SEQ ID NO: 331) and the initial analysis of hot spots,cold spots and CpG content (B).

FIG. 43 Provides the polynucleotide sequence of unmodified NisT (A; SEQID NO: 332) and SHM resistant (cold) form of NisT (B; SEQ ID NO: 333).

FIG. 44 Provides the polynucleotide sequence of unmodified NisA (A; SEQID NO: 334), as well as the initial analysis of hot spots (B) and coldspots (C). Also shown is a synthetic form of NisA (SEQ ID NO: 335)showing areas of SHM resistant sequence (D; underlined) and SHMsusceptible sequence (D; non-underlined), and the analysis of hot spots(E) and cold spots (F).

FIG. 45 Provides the polynucleotide sequence of unmodified NisC (A; SEQID NO: 336), as well as the initial analysis of hot spots (B) and coldspots (C).

FIG. 46 Shows a synthetic resistant (cold) form of NisC (A; SEQ ID NO:337) showing the analysis of hot spots (B) and cold spots (C).

FIG. 47 Provides a diagram of the synthesis and maturation of Nisin(47A). FIG. 47B illustrates a backbone trace of the protein NisC, asdescribed in the pdb structure 2GOD (rcsb.org/pdb/), with residues inthe binding pocket highlighted. A zinc metal and several cysteine,histidine and aspartate residues are the residues that carry outcyclization of NisA. Here, residues within 10 Angstroms (Å) of thecatalytic site are labeled and shown with a surface representation.

FIG. 48 FIG. 48A shows cells expressing the 30 pM HyHEL antibody (darkgray) or no antibody (light gray) after selection by incubating withstreptavidin microparticles conjugated to the mature Hen Egg Lysozyme(HEL) protein (Protein), HEL peptide monomer (Monomer), tandem HEL dimer(Tandem), HEL MAPS dimer (MAPS) or naked unconjugated streptavidinmicroparticles (Naked). FIG. 48B shows cells expressing the 800 pM HyHELantibody (dark gray) or no antibody (light gray) were selected byincubating with tosylactivated microparticles conjugated to either themature HEL protein (Protein) or naked unconjugated tosylactivatedmicroparticles (Naked).

FIG. 49 FIG. 49A Shows cells expressing the 30 pM HyHEL antibody wereselected by incubating with streptavidin microparticles conjugated tothe mature HEL protein (Protein), HEL peptide monomer (Monomer), tandemHEL dimer (Tandem), HEL MAPS dimer (MAPS) or naked unconjugatedstreptavidin microparticles (Naked). FIG. 49B shows cells expressing the800 pM HyHEL antibody were selected by incubating with tosylactivatedmicroparticles conjugated to either the mature HEL protein (Protein) ornaked unconjugated tosylactivated microparticles (Naked).

FIG. 50 shows the 20 most hot spot codon hypermutation transition eventswithin the FR and CDR regions of heavy chain antibodies, where thenumbers labeling the arrows indicate how often a codon transition eventwas observed. The codons AGC and AGT (Serine), and to a lesser extentTAC and TAT (Tyrosine), account for ˜50% of the originating mutationsobserved in affinity matured antibodies. Use of these hot spot codonswithin the correct reading frame, combined with affinity maturationleads to many fewer observed silent mutations within CDRs compared toframework regions (highlighted by dotted circles in the figure).

FIG. 51 HEK-293 cells transfected with a low affinity anti-HEL antibody(comprising the light chain mutation N31G) and an constitutive AIDexpression vector either after stable transfection and selection (panelsA and C) or transiently with the addition of re-transfected AIDexpression vector (panels B and D) were incubated with either 50 pMHEL-FITC (A and B) or 500 pM HEL-FITC(C and D) and livingHEL-FITC-binding cells were sorted and expanded in culture for anotherround of selection and sequence analysis.

FIG. 52 Previously sorted HEK-293 cells expressing anti-HEL antibodiesand constitutive canine AID either after stable transfection andselection (A and C) or transiently with the addition of re-transfectedAID expression vector (panels B and D) were incubated with either 50 pMHEL-FITC (A and B) or 500 pM HEL-FITC (C and D) and livingHEL-FITC-binding cells were sorted and expanded in culture for anotherround of selection and sequence analysis.

FIG. 53 HEK-293 cells transfected with a low affinity anti-HEL antibodyand evolved over 4 rounds of selection and evolution were analyzed byincubation with 50 pM HEL-FITC, as described in Example 13. Panel Ashows that over 4 rounds of evolution, a clear increase in positivecells is evident in both the FACS scatter plot (panel A), as well astotal number of positive cells gated (panel B).

FIG. 54 Panel A shows a selection of amino sequences around the HyHEL10light chain CDR1, showing the evolved sequence around the site of theAsn 31 mutation introduced in the starting constructs (SEQ ID NOS: 367,368 and 369). Panel B shows the corresponding nucleic acid sequences(SEQ ID NOS: 370, 371 and 372), and panel C shows a representation ofthe measured affinity of the evolved mutants.

DETAILED DESCRIPTION OF THE INVENTION I. Definitions

As used herein and in the appended claims, the terms “a,” “an” and “the”can mean, for example, one or more, or at least one, of a unit unlessthe context clearly dictates otherwise. Thus, for example, reference to“an antibody” includes a plurality of such antibodies and reference to“a variable domain” includes reference to one or more variable domainsand equivalents thereof known to those skilled in the art, and so forth.In the event that there is a plurality of definitions for a term herein,those in this section prevail unless stated otherwise.

Where a range of values is provided, it is understood that eachintervening value, to the tenth of the unit of the lower limit unlessthe context clearly dictates otherwise, between the upper and lowerlimits of that range is also specifically disclosed. Each smaller rangebetween any stated value or intervening value in a stated range and anyother stated or intervening value in that stated range is encompassedwithin the invention. The upper and lower limits of these smaller rangescan independently be included or excluded in the range, and each rangewhere either, neither or both limits are included in the smaller rangesis also encompassed within the invention, subject to any specificallyexcluded limit in the stated range. Where the stated range includes oneor both of the limits, ranges excluding either or both of those includedlimits are also included in the invention.

The term “specific binding member” describes a member of a pair ofmolecules which have binding specificity for one another. The members ofa specific binding pair can be naturally derived or wholly or partiallysynthetically produced. One member of the pair of molecules has an areaon its surface, or a cavity, which specifically binds to and istherefore complementary to a particular spatial and/or polarorganization of the other member of the pair of molecules. Thus, themembers of the pair have the property of binding specifically, to eachother. Examples of types of specific binding pairs includeantigen-antibody, Avimer™-substrate, biotin-avidin, hormone-hormonereceptor, receptor-ligand, protein-protein, and enzyme-substrate.

The term “antibody” describes an immunoglobulin whether natural orpartly or wholly synthetically produced. The term also covers anypolypeptide or protein having a binding domain which is, or ishomologous to, an antigen-binding domain. CDR grafted antibodies arealso contemplated by this term.

“Native antibodies” and “native immunoglobulins” are usuallyheterotetrameric glycoproteins of about 150,000 daltons, composed of twoidentical light (L) chains and two identical heavy (H) chains. Eachlight chain is, in some cases, linked to a heavy chain by one covalentdisulfide bond, while the number of disulfide linkages varies among theheavy chains of different immunoglobulin isotypes. Each heavy and lightchain also has regularly spaced intrachain disulfide bridges. Each heavychain has at one end a variable domain (“V_(H)”) followed by a number ofconstant domains (“C_(H)”). Each light chain has a variable domain atone end (“V_(L)”) and a constant domain (“C_(L)”) at its other end; theconstant domain of the light chain is aligned with the first constantdomain of the heavy chain, and the light-chain variable domain isaligned with the variable domain of the heavy chain. Particular aminoacid residues are believed to form an interface between the light- andheavy-chain variable domains.

The term “variable domain” refers to protein domains that differextensively in sequence among family members (i.e. among differentisoforms, or in different species). With respect to antibodies, the term“variable domain” refers to the variable domains of antibodies that areused in the binding and specificity of each particular antibody for itsparticular antigen. However, the variability is not evenly distributedthroughout the variable domains of antibodies. It is concentrated inthree segments called hypervariable regions both in the light chain andthe heavy chain variable domains. The more highly conserved portions ofvariable domains are called the “framework region” or “FR.” The variabledomains of unmodified heavy and light chains each comprise four FRs(FR1, FR2, FR3 and FR4, respectively), largely adopting a β-sheetconfiguration, connected by three hypervariable regions, which formloops connecting, and in some cases forming part of, the β-sheetstructure. The hypervariable regions in each chain are held together inclose proximity by the FRs and, with the hypervariable regions from theother chain, contribute to the formation of the antigen-binding site ofantibodies (see Kabat et al., Sequences of Proteins of ImmunologicalInterest, 5th Ed. Public Health Service, National Institutes of Health,Bethesda, Md. (1991), pages 647 669). The constant domains are notinvolved directly in binding an antibody to an antigen, but exhibitvarious effector functions, such as participation of the antibody inantibody-dependent cellular toxicity.

The term “hypervariable region” when used herein refers to the aminoacid residues of an antibody which are responsible for antigen-binding.The hypervariable region comprises amino acid residues from three“complementarity determining regions” or “CDRs,” which directly bind, ina complementary manner, to an antigen and are known as CDR1, CDR2, andCDR3 respectively.

In the light chain variable domain, the CDRs correspond to approximatelyresidues 24-34 (CDRL1), 50-56 (CDRL2) and 89-97 (CDRL3), and in theheavy chain variable domain the CDRs correspond to approximatelyresidues 31-35 (CDRH1), 50-65 (CDRH2) and 95-102 (CDRH3); Kabat et al.,Sequences of Proteins of Immunological Interest, 5th Ed. Public HealthService, National Institutes of Health, Bethesda, Md. (1991)) and/orthose residues from a “hypervariable loop” (i.e., residues 26-32 (L1),50-52 (L2) and 91-96 (L3) in the light chain variable domain and 26-32(H1), 53-55 (H2) and 96-101 (H3) in the heavy chain variable domain;Chothia and Lesk J., Mol. Biol. 196:901-917 (1987)).

As used herein, “variable framework region” or “VFR” refers to frameworkresidues that form a part of the antigen binding pocket and/or groovethat may contact antigen. In some embodiments, the framework residuesform a loop that is a part of the antigen binding pocket or groove. Theamino acids residues in the loop may or may not contact the antigen. Inan embodiment, the loop amino acids of a VFR are determined byinspection of the three-dimensional structure of an antibody, antibodyheavy chain, or antibody light chain. The three-dimensional structurecan be analyzed for solvent accessible amino acid positions as suchpositions are likely to form a loop and/or provide antigen contact in anantibody variable domain. Some of the solvent accessible positions cantolerate amino acid sequence diversity and others (e.g. structuralpositions) can be less diversified. The three-dimensional structure ofthe antibody variable domain can be derived from a crystal structure orprotein modeling. In some embodiments, the VFR comprises, consistsessentially of, or consists of amino acid positions corresponding toamino acid positions 71 to 78 of the heavy chain variable domain, thepositions defined according to Kabat et al., 1991. In some embodiments,VFR forms a portion of Framework Region 3 located between CDRH2 andCDRH3. Preferably, VFR forms a loop that is well positioned to makecontact with a target antigen or form a part of the antigen bindingpocket.

Depending on the amino acid sequence of the constant domain of theirheavy chains, immunoglobulins can be assigned to different classes.There are five major classes of immunoglobulins: IgA, IgD, IgE, IgG, andIgM, and several of these can be further divided into subclasses(isotypes), e.g., IgG1, IgG2, IgG3, IgG4, IgA, and IgA2. The heavy-chainconstant domains (Fc) that correspond to the different classes ofimmunoglobulins are called α, δ, ε, γ, and μ, respectively. The subunitstructures and three-dimensional configurations of different classes ofimmunoglobulins are well known.

The “light chains” of antibodies (immunoglobulins) from any vertebratespecies can be assigned to one of two clearly distinct types, calledkappa or (“κ”) and lambda or (“λ”), based on the amino acid sequences oftheir constant domains.

The terms “antigen-binding portion of an antibody,” “antigen-bindingfragment,” “antigen-binding domain,” “antibody fragment” or a“functional fragment of an antibody” are used interchangeably in thepresent invention to mean one or more fragments of an antibody thatretain the ability to specifically bind to an antigen (see, e.g.,Holliger et al., Nature Biotech. 23 (9): 1126-1129 (2005)). Non-limitingexamples of antibody fragments included within, but not limited to, theterm “antigen-binding portion” of an antibody include (i) a Fabfragment, a monovalent fragment consisting of the V_(L), V_(H), C_(L)and C_(H1) domains; (ii) a F(ab)₂ fragment, a bivalent fragmentcomprising two Fab fragments linked by a disulfide bridge at the hingeregion; (iii) a Fd fragment consisting of the V_(H) and C_(H1) domains;(iv) a Fv fragment consisting of the V_(L) and V_(H) domains of a singlearm of an antibody, (v) a dAb fragment (Ward et al., (1989) Nature341:544-546), which consists of a V_(H) domain; and (vi) an isolatedcomplementarity determining region (CDR). Furthermore, although the twodomains of the Fv fragment, V_(L) and V_(H), are coded for by separategenes, they can be joined, using recombinant methods, by a syntheticlinker that enables them to be made as a single protein chain in whichthe V_(L) and V_(H) regions pair to form monovalent molecules (known assingle chain Fv (scFv); see e.g., Bird et al. (1988) Science242:423-426; and Huston et al. (1988) Proc. Natl. Acad. Sci. USA85:5879-5883; and Osbourn et al. (1998) Nat. Biotechnol. 16:778). Suchsingle chain antibodies are also intended to be encompassed within theterm “antigen-binding portion” of an antibody. Any V_(H) and V_(L)sequences of specific scFv can be linked to human immunoglobulinconstant region cDNA or genomic sequences, in order to generateexpression vectors encoding complete IgG molecules or other isotypes.V_(H) and V_(L) can also be used in the generation of Fab, Fv or otherfragments of immunoglobulins using either protein chemistry orrecombinant DNA technology. Other forms of single chain antibodies, suchas diabodies are also encompassed.

“F(ab′)₂” and “Fab′” moieties can be produced by treating immunoglobulin(monoclonal antibody) with a protease such as pepsin and papain, andincludes an antibody fragment generated by digesting immunoglobulin nearthe disulfide bonds existing between the hinge regions in each of thetwo H chains. For example, papain cleaves IgG upstream of the disulfidebonds existing between the hinge regions in each of the two H chains togenerate two homologous antibody fragments in which an L chain composedof V_(L) (L chain variable region) and C_(L) (L chain constant region),and an H chain fragment composed of V_(H) (H chain variable region) andC_(HγI) ('γ1 region in the constant region of H chain) are connected attheir C terminal regions through a disulfide bond. Each of these twohomologous antibody fragments is called Fab′. Pepsin also cleaves IgGdownstream of the disulfide bonds existing between the hinge regions ineach of the two H chains to generate an antibody fragment slightlylarger than the fragment in which the two above-mentioned Fab′ areconnected at the hinge region. This antibody fragment is called F(ab′)₂.

The Fab fragment also contains the constant domain of the light chainand the first constant domain (C_(H)1) of the heavy chain. Fab′fragments differ from Fab fragments by the addition of a few residues atthe carboxyl terminus of the heavy chain C_(H)1 domain including one ormore cysteine(s) from the antibody hinge region. Fab′-SH is thedesignation herein for Fab′ in which the cysteine residue(s) of theconstant domains bear a free thiol group. F(ab′)₂ antibody fragmentsoriginally were produced as pairs of Fab′ fragments which have hingecysteines between them. Other chemical couplings of antibody fragmentsare also known.

“Fv” is the minimum antibody fragment which contains a completeantigen-recognition and antigen-binding site. This region consists of adimer of one heavy chain and one light chain variable domain in tight,non-covalent association. It is in this configuration that the threehypervariable regions of each variable domain interact to define anantigen-binding site on the surface of the V_(H)-V_(L) dimer.Collectively, the six hypervariable regions confer antigen-bindingspecificity to the antibody. However, even a single variable domain (orhalf of an Fv comprising only three hypervariable regions specific foran antigen) has the ability to recognize and bind antigen, although at alower affinity than the entire binding site.

“Single-chain Fv” or “sFv” antibody fragments comprise the V_(H) andV_(L) domains of an antibody, wherein these domains are present in asingle polypeptide chain. In some embodiments, the Fv polypeptidefurther comprises a polypeptide linker between the V_(H) and V_(L)domains which enables the sFv to form the desired structure for antigenbinding. For a review of sFv molecules, see, e.g., Pluckthun in ThePharmacology of Monoclonal Antibodies, Vol. 113, Rosenburg and Mooreeds. Springer-Verlag, New York, pp. 269-315 (1994).

The term “Avimer™” refers to a new class of therapeutic proteins thatare from human origin, which are unrelated to antibodies and antibodyfragments, and are composed of several modular and reusable bindingdomains, referred to as A-domains (also referred to as class A module,complement type repeat, or LDL-receptor class A domain). They weredeveloped from human extracellular receptor domains by in vitro exonshuffling and phage display, (Silverman et al., 2005, Nat. Biotechnol.23:1493-94; Silverman et al., 2006, Nat. Biotechnol. 24:220). Theresulting proteins can comprise multiple independent binding domainsthat can exhibit improved affinity (in some cases sub-nanomolar) andspecificity compared with single-epitope binding proteins. See, forexample, U.S. Patent Application Publ. Nos. 2005/0221384, 2005/0164301,2005/0053973 and 2005/0089932, 2005/0048512, and 2004/0175756, each ofwhich is hereby incorporated by reference herein in its entirety.

Each of the known 217 human A-domains comprises ˜35 amino acids (˜4 kDa)and domains are separated by linkers that average five amino acids inlength. Native A-domains fold quickly and efficiently to a uniform,stable structure mediated primarily by calcium binding and disulfideformation. A conserved scaffold motif of only 12 amino acids is neededfor this common structure. The end result is a single protein chaincontaining multiple domains, each of which represents a separatefunction. Each domain of the proteins binds independently and that theenergetic contributions of each domain are additive. These proteins werecalled “Avimer™” from avidity multimers.

As used herein, “natural” or “naturally occurring” antibodies orantibody variable domains, refers to antibodies or antibody variabledomains having a sequence of an antibody or antibody variable domainidentified from a non-synthetic source, for example, from a germlinesequence, or differentiated antigen-specific B cell obtained ex vivo, orits corresponding hybridoma cell line, or from the serum of an animal.These antibodies can include antibodies generated in any type of immuneresponse, either natural or otherwise induced. Natural antibodiesinclude the amino acid sequences, and the nucleotide sequences thatconstitute or encode these antibodies, for example, as identified in theKabat database.

The term “synthetic polynucleotide” or “synthetic gene” means that thecorresponding polynucleotide sequence, or amino acid sequence, isderived, in whole or part, from a sequence that has been designed, orsynthesized de novo, or modified as compared to an equivalent unmodifiedsequence. Synthetic genes can be prepared in whole or part, via chemicalsynthesis, or amplified via PCR, or similar enzymatic amplificationsystems. Synthetic genes are, in some embodiments, different fromunmodified genes, either at the amino acid, or polynucleotide level, (orboth) and are, for example, located within the context of syntheticexpression control sequences. Synthetic gene sequences can include aminoacid or polynucleotide sequences that have been changed, for example, bythe replacement, deletion, or addition, of one or more, amino acids ornucleotides, thereby providing an amino acid sequence, or apolynucleotide coding sequence that is different from the sourcesequence. Synthetic gene polynucleotide sequences may not necessarilyencode proteins with different amino acids, compared to the unmodifiedgene, for example, they can also encompass synthetic polynucleotidesequences that incorporate different codons or motifs, but which encodethe same amino acid(s); i.e., the nucleotide changes can representsilent mutations at the amino acid level. In one embodiment, syntheticgenes exhibit altered susceptibility to SHM compared to the unmodifiedgene. Synthetic genes can be iteratively modified using the methodsdescribed herein and, in each successive iteration, a correspondingpolynucleotide sequence or amino acid sequence, is derived, in whole orpart, from a sequence that has been designed, or synthesized de novo, ormodified, compared to an equivalent unmodified sequence.

The terms “semi-synthetic polynucleotide” or “semi-synthetic gene,” asused herein, refer to polynucleotide sequences that consist in part of anucleic acid sequence that has been obtained via polymerase chainreaction (PCR) or other similar enzymatic amplification system whichutilizes a natural donor (i.e., peripheral blood monocytes) as thestarting material for the amplification reaction. The remaining“synthetic” polynucleotides, i.e., those portions of semi-syntheticpolynucleotide not obtained via PCR or other similar enzymaticamplification system can be synthesized de-novo using methods known inthe art including, but not limited to, the chemical synthesis of nucleicacid sequences.

The term “synthetic variable regions” refers to synthetic polynucleotidesequences within a synthetic gene that are substantially comprised ofoptimal SHM hot spots and hot codons or motifs that, when combined withthe activity of AID and one or more error-prone polymerases, generates abroad spectrum of potential amino acid diversity at each position.Synthetic variable regions can encode antibody or non-antibodypolypeptides and can be separated by synthetic frame work sequences thatencompass codons or motifs that are not specifically targeted for, orsusceptible to, SHM, or that are resistant to SHM.

The term “diabodies” refers to small antibody fragments with twoantigen-binding sites, which fragments comprise a heavy chain variabledomain (V_(H)) connected to a light chain variable domain (V_(L)) in thesame polypeptide chain (V_(H)-V_(L)). By using a linker that is tooshort to allow pairing between the two domains on the same chain, thedomains are forced to pair with the complementary domains of anotherchain and create two antigen-binding sites. Diabodies are described morefully in, for example, EP 404,097; WO 93/11161; and Hollinger et al.,Proc. Natl. Acad. Sci. USA 90:6444-6448 (1993).

Antibodies of the present invention also include heavy chain dimers,such as antibodies from camelids and sharks. Camelid and sharkantibodies comprise a homodimeric pair of two chains of V-like andC-like domains (neither has a light chain). Since the V_(H) region of aheavy chain dimer IgG in a camelid does not have to make hydrophobicinteractions with a light chain, the region in the heavy chain thatnormally contacts a light chain is changed to hydrophilic amino acidresidues in a camelid. V_(H) domains of heavy-chain dimer IgGs arecalled V_(HH) domains. Shark Ig-NARs comprise a homodimer of onevariable domain (termed a V-NAR domain) and five C-like constant domains(C-NAR domains).

In camelids, the diversity of antibody repertoire is determined by thecomplementary determining regions (CDR) 1, 2, and 3 in the V_(H) orV_(HH) regions. The CDR3 in the camel V_(HH) region is characterized byits relatively long length averaging 16 amino acids (Muyldermans et al.,1994, Protein Engineering 7(9): 1129). This is in contrast to CDR3regions of antibodies of many other species. For example, the CDR3 ofmouse V_(H) has an average of 9 amino acids.

Libraries of camelid-derived antibody variable regions, which maintainthe in vivo diversity of the variable regions of a camelid, can be madeby, for example, the methods disclosed in U.S. Patent Application Ser.No. 20050037421, published Feb. 17, 2005.

“Humanized” forms of non-human (e.g., murine) antibodies are chimericantibodies which contain minimal sequence derived from non-humanimmunoglobulin. For the most part, humanized antibodies are humanimmunoglobulins (recipient antibody) in which hypervariable regionresidues of the recipient are replaced by hypervariable region residuesfrom a synthetic, or non-human source, such as mouse, rat, rabbit ornon-human primate (donor antibody) having the desired specificity,affinity, and capacity. In some instances, framework region (FR)residues of the human immunoglobulin are replaced by correspondingnon-human residues. Furthermore, humanized antibodies can compriseresidues which are not found in the recipient antibody or in the donorantibody. These modifications are made to further refine antibodyperformance. A humanized antibody can comprise substantially all of atleast one (and in some cases two) variable domain(s), in which all orsubstantially all of the hypervariable regions correspond to those of anon-human immunoglobulin and all or substantially all of the FRs arethose of a human immunoglobulin sequence. The humanized antibody,optionally, also can comprise at least a portion of an immunoglobulinconstant region (Fc), such as that of a human immunoglobulin. Forfurther details, see Jones et al., Nature 321:522-525 (1986); Reichmannet al., Nature 332:323-329 (1988); and Presta, Curr. Op. Struct. Biol.2:593-596 (1992).

A “humanized antibody” of the present invention includes semi-syntheticantibodies prepared by genetic engineering and specifically includes amonoclonal antibody in which the CDR3 of the heavy and light chain isderived from a non-human monoclonal antibody (e.g. murine monoclonalantibody), or region is derived from synthetic variable regionpolynucleotides, as described herein (both heavy and light chain) andthe constant region is derived from the synthetic human constant regiontemplates likewise described herein and in commonly owned, priority U.S.Provisional Patent Application Nos. 60/904,622 and 61/020,124.

The term “monoclonal antibody” as used herein refers to an antibodyobtained from a population of substantially homogeneous antibodies,i.e., the individual antibodies comprising the population are identicalexcept for possible naturally occurring mutations that can be present inminor amounts. Monoclonal antibodies are highly specific, being directedagainst a single antigenic site. Furthermore, in contrast toconventional (polyclonal) antibody preparations, which can includedifferent antibodies directed against different determinants (epitopes),each monoclonal antibody is directed against a single determinant on theantigen. The modifier “monoclonal” indicates the character of theantibody as being obtained from a substantially homogeneous populationof antibodies, and is not to be construed as requiring production of theantibody by any particular method. For example, the monoclonalantibodies to be used in accordance with the present invention can bemade by the hybridoma method first described by Kohler et al., Nature256:495 (1975), or can be made by recombinant DNA methods (see, e.g.,U.S. Pat. No. 4,816,567). In certain embodiments, the “monoclonalantibodies” can also be isolated from phage antibody libraries using thetechniques described in Clackson et al., Nature 352:624-628 (1991) andMarks et al., J. Mol. Biol. 222:581-597 (1991), for example.

In other embodiments, monoclonal antibodies can be isolated and purifiedfrom the culture supernatant or ascites mentioned above by saturatedammonium sulfate precipitation, euglobulin precipitation method, caproicacid method, caprylic acid method, ion exchange chromatography (DEAE orDE52), or affinity chromatography using anti-immunoglobulin column orprotein A column.

A polyclonal antibody (antiserum) or monoclonal antibody can be producedby known methods. Namely, mammals, preferably, mice, rats, hamsters,guinea pigs, rabbits, cats, dogs, pigs, goats, horses, or cows, or morepreferably, mice, rats, hamsters, guinea pigs, or rabbits are immunized,for example, with an antigen mentioned above with Freund's adjuvant, ifnecessary. The polyclonal antibody can be obtained from the serumobtained from the animal so immunized. The monoclonal antibodies areproduced as follows. Hybridomas are produced by fusing theantibody-producing cells obtained from the animal so immunized andmyeloma cells incapable of producing auto-antibodies. Then thehybridomas are cloned, and clones producing the monoclonal antibodiesshowing the specific affinity to the antigen used for immunizing themammal are screened.

As used herein, an “intrabody or fragment thereof” refers to antibodiesthat are expressed and function intracellularly. Intrabodies, in someembodiments, lack disulfide bonds and are capable of modulating theexpression or activity of target genes through their specific bindingactivity. Intrabodies include single domain fragments such as isolatedV_(H) and V_(L) domains and scFvs. An intrabody can include sub-cellulartrafficking signals attached to the N or C terminus of the intrabodiesto allow them to be expressed at high concentrations in the sub-cellularcompartments where a target protein is located. Upon interaction withthe target gene, an intrabody modulates target protein function, and/orachieves phenotypic/functional knockout by mechanisms such asaccelerating target protein degradation and sequestering the targetprotein in a non-physiological sub-cellular compartment. Othermechanisms of intrabody-mediated gene inactivation can depend on theepitope to which the intrabody is directed, such as binding to thecatalytic site on a target protein or to epitopes that are involved inprotein-protein, protein-DNA or protein-RNA interactions. In oneembodiment, an intrabody is a scFv.

The “cell producing an antibody reactive to a protein or a fragmentthereof” of the present invention means any cell producing any of theabove-described antibodies of the present invention.

The term “germline gene segments” refers to the genes from the germline(the haploid gametes and those diploid cells from which they areformed). The germline DNA contain multiple gene segments that encode asingle immunoglobulin heavy or light chain. These gene segments arecarried in the germ cells but cannot be transcribed and translated intoheavy and light chains until they are arranged into functional genes.During B-cell differentiation in the bone marrow, these gene segmentsare randomly shuffled by a dynamic genetic system capable of generatingmore than 108 specificities. Most of these gene segments are publishedand collected by the germline database.

As used herein, “library” refers to a plurality of polynucleotides,proteins, or cells comprising two or more non-identical members. A“synthetic library” refers to a plurality of synthetic polynucleotides,or a population of cells that comprise said plurality of syntheticpolynucleotides. A “semi-synthetic library” refers to a plurality ofsemi-synthetic polynucleotides, or a population of cells that comprisesaid plurality of semi-synthetic polynucleotides. A “seed library”refers to a plurality of one or more synthetic or semi-syntheticpolynucleotides, or cells that comprise said polynucleotides, thatcontain one or more sequences or portions thereof, that have beenmodified to increase or decrease susceptibility to SHM, e.g.,AID-mediated SHM, and that are capable, when acted upon by somatichypermutation, to create a library of polynucleotides, proteins or cellsin situ.

As used herein, the term “antigen” refers to substances that arecapable, under appropriate conditions, of inducing an immune response tothe substance and of reacting with the products of the immune response.For example, an antigen can be recognized by antibodies (humoral immuneresponse) or sensitized T-lymphocytes (T helper or cell-mediated immuneresponse), or both. Antigens can be soluble substances, such as toxinsand foreign proteins, or particulates, such as bacteria and tissuecells; however, only the portion of the protein or polysaccharidemolecule known as the antigenic determinant (epitopes) combines with theantibody or a specific receptor on a lymphocyte. More broadly, the term“antigen” refers to any substance to which an antibody binds, or forwhich antibodies are desired, regardless of whether the substance isimmunogenic. For such antigens, antibodies can be identified byrecombinant methods, independently of any immune response.

As used herein, the term “affinity” refers to the equilibrium constantfor the reversible binding of two agents and is expressed as Kd.Affinity of a binding protein to a ligand such as affinity of anantibody for an epitope can be, for example, from about 100 nanomolar(nM) to about 0.1 nM, from about 100 nM to about 1 picomolar (pM), orfrom about 100 nM to about 1 femtomolar (fM). As used herein, the term“avidity” refers to the resistance of a complex of two or more agents todissociation after dilution.

“Epitope” refers to that portion of an antigen or other macromoleculecapable of forming a binding interaction that interacts with thevariable region binding pocket of a binding protein. Such bindinginteraction can be manifested as an intermolecular contact with one ormore amino acid residues of a CDR. Antigen binding can involve a CDR3 ora CDR3 pair. An epitope can be a linear peptide sequence (i.e.,“continuous”) or can be composed of noncontiguous amino acid sequences(i.e., “conformational” or “discontinuous”). A binding protein canrecognize one or more amino acid sequences; therefore an epitope candefine more than one distinct amino acid sequence. Epitopes recognizedby binding protein can be determined by peptide mapping and sequenceanalysis techniques well known to one of skill in the art. A “crypticepitope” or a “cryptic binding site” is an epitope or binding site of aprotein sequence that is not exposed or substantially protected fromrecognition within an unmodified polypeptide, but is capable of beingrecognized by a binding protein of a denatured or proteolyzedpolypeptide. Amino acid sequences that are not exposed, or are onlypartially exposed, in the unmodified polypeptide structure are potentialcryptic epitopes. If an epitope is not exposed, or only partiallyexposed, then it is likely that it is buried within the interior of thepolypeptide. Candidate cryptic epitopes can be identified, for example,by examining the three-dimensional structure of an unmodifiedpolypeptide.

The term “specific” is applicable to a situation in which one member ofa specific binding pair will not show any significant binding tomolecules other than its specific binding partner(s). The term is alsoapplicable where e.g. an antigen binding domain is specific for aparticular epitope which is carried by a number of antigens, in whichcase the specific binding member carrying the antigen binding domainwill be able to bind to the various antigens carrying the epitope.

The term “binding” refers to a direct association between two molecules,due to, for example, covalent, electrostatic, hydrophobic, and ionicand/or hydrogen-bond interactions, including interactions such as saltbridges and water bridges.

The term “adjuvant” refers to a compound or mixture that enhances theimmune response, particularly to an antigen. An adjuvant can serve as atissue depot that slowly releases the antigen and also as a lymphoidsystem activator that non-specifically enhances the immune response(Hood et al., Immunology, Second Ed., 1984, Benjamin/Cummings: MenloPark, Calif., p. 384). Often, a primary challenge with an antigen alone,in the absence of an adjuvant, will fail to elicit a humoral or cellularimmune response. Previously known and utilized adjuvants include, butare not limited to, complete Freund's adjuvant, incomplete Freund'sadjuvant, saponin, mineral gels such as aluminum hydroxide, surfaceactive substances such as lysolecithin, pluronic polyols, polyanions,peptides, oil or hydrocarbon emulsions, keyhole limpet hemocyanins,dinitrophenol, and potentially useful human adjuvant such as BCG(Bacille Calmette-Guerin) and Corynebacterium parvum. Mineral saltadjuvants include but are not limited to: aluminum hydroxide, aluminumphosphate, calcium phosphate, zinc hydroxide and calcium hydroxide.Preferably, the adjuvant composition further comprises a lipid of fatemulsion comprising about 10% (by weight) vegetable oil and about 1-2%(by weight) phospholipids. Preferably, the adjuvant composition furtheroptionally comprises an emulsion form having oily particles dispersed ina continuous aqueous phase, having an emulsion forming polyol in anamount of from about 0.2% (by weight) to about 49% (by weight),optionally a metabolizable oil in an emulsion-forming amount of up to15% (by weight), and optionally a glycol ether-based surfactant in anemulsion-stabilizing amount of up to about 5% (by weight).

As used herein, the term “immunomodulator” refers to an agent which isable to modulate an immune response. An example of such modulation is anenhancement of antibody production. Another example of such modulationis an enhancement of a T cell response.

An “immunological response” to a composition or vaccine comprised of anantigen is the development in the host of a cellular- and/orantibody-mediated immune response to the composition or vaccine ofinterest. Usually, such a response consists of the subject producingantibodies, B cells, helper T cells, suppressor T cells, and/orcytotoxic T cells directed specifically to an antigen or antigensincluded in the composition or vaccine of interest.

The term “nucleotide” as used herein refers to a monomeric unit of apolynucleotide that consists of a heterocyclic base, a sugar, and one ormore phosphate groups. The naturally occurring bases, (guanine, (G),adenine, (A), cytosine, (C), thymine, (T), and uracil (U)) arederivatives of purine or pyrimidine, though it should be understood thatnaturally and non-naturally occurring base analogs are also included.The naturally occurring sugar is the pentose (five-carbon sugar)deoxyribose (which forms DNA) or ribose (which forms RNA), though itshould be understood that naturally and non-naturally occurring sugaranalogs are also included. Nucleic acids are linked via phosphate bondsto form nucleic acids, or polynucleotides, though many other linkagesare known in the art (such as, though not limited to phosphorothioates,boranophosphates and the like).

The terms “nucleic acid” and “polynucleotide” as used herein refer to apolymeric form of nucleotides of any length, either ribonucleotides(RNA) or deoxyribonucleotides (DNA). These terms refer to the primarystructure of the molecules and, thus, include double- andsingle-stranded DNA, and double- and single-stranded RNA. These termsinclude, as equivalents, analogs of either RNA or DNA made fromnucleotide analogs and modified polynucleotides such as, though notlimited to, methylated and/or capped polynucleotides.

A “DNA molecule” refers to the polymeric form of deoxyribonucleotides(adenine, guanine, thymine, or cytosine) in its either single strandedform or a double-stranded helix. This term refers only to the primaryand secondary structure of the molecule, and does not limit it to anyparticular tertiary forms. Thus, this term includes double-stranded DNAfound, inter alia, in linear DNA molecules (e.g., restrictionfragments), viruses, plasmids, and chromosomes. In discussing thestructure of particular double-stranded DNA molecules, sequences can bedescribed herein according to the normal convention of giving only thesequence in the 5′ to 3′ direction along the non-transcribed strand ofDNA (i.e., the strand having a sequence homologous to the mRNA).

A DNA “coding sequence” or “coding region” is a double-stranded DNAsequence which is transcribed and translated into a polypeptide in vivowhen placed under the control of appropriate expression controlsequences. The boundaries of the coding sequence (the “open readingframe” or “ORF”) are determined by a start codon at the 5′ (amino)terminus and a translation stop codon at the 3′ (carboxyl) terminus. Acoding sequence can include, but is not limited to, prokaryoticsequences, cDNA from eukaryotic mRNA, genomic DNA sequences fromeukaryotic (e.g., mammalian) DNA, and synthetic DNA sequences. Apolyadenylation signal and transcription termination sequence is,usually, be located 3′ to the coding sequence. The term “non-codingsequence” or “non-coding region” refers to regions of a polynucleotidesequence that are not translated into amino acids (e.g. 5′ and 3′un-translated regions).

The term “reading frame” refers to one of the six possible readingframes, three in each direction, of the double stranded DNA molecule.The reading frame that is used determines which codons are used toencode amino acids within the coding sequence of a DNA molecule.

As used herein, an “antisense” nucleic acid molecule comprises anucleotide sequence which is complementary to a “sense” nucleic acidencoding a protein, e.g., complementary to the coding strand of adouble-stranded cDNA molecule, complementary to an mRNA sequence orcomplementary to the coding strand of a gene. Accordingly, an antisensenucleic acid molecule can hydrogen bond to a sense nucleic acidmolecule.

The term “base pair” or (“bp”): a partnership of adenine (A) withthymine (T), or of cytosine (C) with guanine (G) in a double strandedDNA molecule. In RNA, uracil (U) is substituted for thymine.

As used herein a “codon” refers to the three nucleotides which, whentranscribed and translated, encode a single amino acid residue; or inthe case of UUA, UGA or UAG encode a termination signal. Codons encodingamino acids are well known in the art and are provided for convenienceherein in Table 1.

TABLE 1 Codon Usage Table Codon Amino acid AA Abbr. Codon Amino acid AAAbbr. UUU Phenylalanine Phe F UCU Serine Ser S UUC Phenylalanine Phe FUCC Serine Ser S UUA Leucine Leu L UCA Serine Ser S UUG Leucine Leu LUCG Serine Ser S CUU Leucine Leu L CCU Proline Pro P CUC Leucine Leu LCCC Proline Pro P CUA Leucine Leu L CCA Proline Pro P CUG Leucine Leu LCCG Proline Pro P AUU Isoleucine Ile I ACU Threonine Thr T AUCIsoleucine Ile I ACC Threonine Thr T AUA Isoleucine Ile I ACA ThreonineThr T AUG Methionine Met M ACH Threonine Thr T GUU Valine Val V GCUAlanine Ala A GUC Valine Val V GCC Alanine Ala A GUA Valine Val V GCAAlanine Ala A GUG Valine Val V GCG Alanine Ala A UAU Tyrosine Tyr Y UGUCysteine Cys C UAC Tyrosine Tyr Y UGC Cysteine Cys C UUA Stop UGA StopUAG Stop UGG Tryptophan Trp W CAU Histidine His H CGU Arginine Arg R CACHistidine His H CGC Arginine Arg R CAA Glutamine Gln Q CGA Arginine ArgR CAG Glutamine Gln Q CGG Arginine Arg R AAU Asparagine Asn N AGU SerineSer S AAC Asparagine Asn N AGC Serine Ser S AAA Lysine Lys K AGAArginine Arg R AAG Lysine Lys K AGG Arginine Arg R GAU Aspartate Asp DGGU Glycine Gly G GAC Aspartate Asp D GGC Glycine Gly G GAA GlutamateGlu E GGA Glycine Gly G GAG Glutamate Glu E GGG Glycine Gly G

AA: amino acid; Abbr: abbreviation. It should be understood that thecodons specified above are for RNA sequences. The corresponding codonsfor DNA have a T substituted for U. Optimal codon usage is indicated bycodon usage frequencies for expressed genes, for example, as shown inthe codon usage chart from the program “Human—_(High.cod)” from theWisconson Sequence Analysis Package, Version 8.1, Genetics ComputerGroup, Madison, Wis. Codon usage is also described in, for example, R.Nussinov, “Eukaryotic Dinucleotide Preference Rules and TheirImplications for Degenerate Codon Usage,” J. Mol. Biol. 149: 125-131(1981). The codons which are most frequently used in highly expressedhuman genes are presumptively the optimal codons for expression in humanhost cells and, thus, form the bases for constructing a synthetic codingsequence.

As used herein, a “wobble position” refers to the third position of acodon. Mutations in a DNA molecule within the wobble position of acodon, in some embodiments, result in silent or conservative mutationsat the amino acid level. For example, there are four codons that encodeGlycine, i.e., GGU, GGC, GGA and GGG, thus mutation of any wobbleposition nucleotide, to any other nucleotide, does not result in achange at the amino acid level of the encoded protein and, therefore, isa silent substitution.

Accordingly a “silent substitution” or “silent mutation” is one in whicha nucleotide within a codon is modified, but does not result in a changein the amino acid residue encoded by the codon. Examples includemutations in the third position of a codon, as well in the firstposition of certain codons such as in the codon “CGG” which, whenmutated to AGG, still encodes Arg.

The phrase “conservative amino acid substitution” or “conservativemutation” refers to the replacement of one amino acid by another aminoacid with a common property. A functional way to define commonproperties between individual amino acids is to analyze the normalizedfrequencies of amino acid changes between corresponding proteins ofhomologous organisms (Schulz, G. E. and R. H. Schirmer, Principles ofProtein Structure, Springer-Verlag). According to such analyses, groupsof amino acids can be defined where amino acids within a group exchangepreferentially with each other, and therefore resemble each other mostin their impact on the overall protein structure (Schulz, G. E. and R.H. Schirmer, Principles of Protein Structure, Springer-Verlag).

Examples of amino acid groups defined in this manner include: a“charged/polar group,” consisting of Glu, Asp, Asn, Gln, Lys, Mg andHis; an “aromatic, or cyclic group,” consisting of Pro, Phe, Tyr andTip; and an “aliphatic group” consisting of Gly, Ala, Val, Leu, Ile,Met, Ser, Thr and Cys.

Within each group, subgroups can also be identified, for example, thegroup of charged/polar amino acids can be sub-divided into thesub-groups consisting of the “positively-charged sub-group,” consistingof Lys, Arg and His; the negatively-charged sub-group,” consisting ofGlu and Asp, and the “polar sub-group” consisting of Asn and Gln.

The aromatic or cyclic group can be sub-divided into the sub-groupsconsisting of the “nitrogen ring sub-group,” consisting of Pro, His andTip; and the “phenyl sub-group” consisting of Phe and Tyr.

The aliphatic group can be sub-divided into the sub-groups consisting ofthe “large aliphatic non-polar sub-group,” consisting of Val, Leu andIle; the “aliphatic slightly-polar sub-group,” consisting of Met, Ser,Thr and Cys; and the “small-residue sub-group,” consisting of Gly andAla.

Examples of conservative mutations include amino acid substitutions ofamino acids within the sub-groups above, for example, Lys for Arg andvice versa such that a positive charge can be maintained; Glu for Aspand vice versa such that a negative charge can be maintained; Ser forThr such that a free —OH can be maintained; and Gln for Asn such that afree —NH₂ can be maintained.

“Semi-conservative mutations” include amino acid substitutions of aminoacids with the same groups listed above, that do not share the samesub-group. For example, the mutation of Asp for Asn, or Asn for Lys allinvolve amino acids within the same group, but different sub-groups.

“Non-conservative mutations” involve amino acid substitutions betweendifferent groups, for example Lys for Leu, or Phe for Ser etc.

The term “amino acid residue” refers to the radical derived from thecorresponding alpha-amino acid by eliminating the OH portion of thecarboxyl group and the H-portion of the alpha amino group. For the mostpart, the amino acids used in the application are those naturallyoccurring amino acids found in proteins, or the naturally occurringanabolic or catabolic products of such amino acids which contain aminoand carboxyl groups. Alternatively, un-natural amino acids can beincorporated into proteins to facilitate the chemical conjugation toother proteins, toxins, small organic compounds or anti-cancer agents(Datta et al., J Am Chem. Soc. 2002; 124(20):5652-3). The abbreviationsused herein for designating the amino acids and the protective groupsare based on recommendations of the IUPAC-IUB Commission on BiochemicalNomenclature (see Biochemistry (1972) 11: 1726-1732). The term “aminoacid residue” also includes analogs, derivatives and congeners of anyspecific amino acid referred to herein, as well as C-terminal orN-terminal protected amino acid derivatives (e.g., modified with anN-terminal or C-terminal protecting group). For example, the presentinvention contemplates the use of amino acid analogs wherein a sidechain is lengthened or shorted while still providing a carboxyl, aminoor other reactive precursor functional group for cyclization, as well asamino acid analogs having variant side chains with appropriatefunctional groups).

The term “amino acid side chain” is that part of an amino acid exclusiveof the —CH—(NH₂)COOH portion, as defined by K. D. Kopple, “Peptides andAmino Acids,” W. A. Benjamin Inc., New York and Amsterdam, 1996, pages 2and 33; examples of such side chains of the common amino acids are—CH₂CH₂SCH₃ (the side chain of methionine), —CH₂(CH₃)—CH₂CH₃ (the sidechain of isoleucine), —CH₂CH(CH₃)₂ (the side chain of leucine) or H—(the side chain of glycine).

The amino acid residues described herein are preferred to be in the “L”isomeric form. However, residues in the “D” isomeric form can besubstituted for any L-amino acid residue, as long as the desiredfunctional property of antibody (immunoglobulin)-binding is retained bythe polypeptide. NH₂ refers to the free amino group present at the aminoterminus of a polypeptide. COOH refers to the free carboxy group presentat the carboxy terminus of a polypeptide.

An “amino acid motif” is a sequence of amino acids, optionally a genericset of conserved amino acids, associated with a particular functionalactivity.

As used herein, the terms “protein,” “peptide” and “polypeptide” areused interchangeably to refer to polymers of amino acid residues of anylength connected to one another by peptide bonds between the alpha-aminogroup and carboxy group of contiguous amino acid residues. Polypeptides,proteins and peptides can exist as linear polymers, branched polymers orin circular form. These terms also include forms that arepost-translationally Modified in vivo or chemically modified duringsynthesis.

It should be noted that all amino acid residue sequences are representedherein by formulae whose left and right orientation is in theconventional direction of amino-terminus to carboxy terminus.Furthermore, it should be noted that a dash at the beginning or end ofan amino acid residue sequence indicates a peptide bond to a furthersequence of one or more amino-acid residues.

The terms “gene,” “recombinant gene” and “gene construct” as usedherein, refer to a DNA molecule, or portion of a DNA molecule, thatencodes a protein or a portion thereof. The DNA molecule can contain anopen reading frame encoding the protein (as exon sequences) and canfurther include intron sequences. The term “intron” as used herein,refers to a DNA sequence present in a given gene which is not translatedinto protein and is found in some, but not all cases, between exons. Itcan be desirable for the gene to be operably linked to, (or it cancomprise), one or more promoters, enhancers, repressors and/or otherregulatory sequences to modulate the activity or expression of the gene,as is well known in the art.

As used herein, a “complementary DNA” or “cDNA” includes recombinantpolynucleotides synthesized by reverse transcription of mRNA and fromwhich intervening sequences (introns) have been removed.

The term “operably linked” as used herein, describes the relationshipbetween two polynucleotide regions such that they are functionallyrelated or coupled to each other. For example, a promoter (or otherexpression control sequence) is operably linked to a coding sequence ifit controls (and is capable of effecting) the transcription of thecoding sequence. Although an operably linked promoter can be locatedupstream of the coding sequence, it is not necessarily contiguous withit.

“Expression control sequences” are DNA regulatory sequences, such aspromoters, enhancers, polyadenylation signals, terminators, internalribosome entry sites (IRES) and the like, that provide for theexpression of a coding sequence in a host cell. Exemplary expressioncontrol sequences are described in Goeddel; Gene Expression Technology:Methods in Enzymology 185, Academic Press, San Diego, Calif. (1990).

A “promoter” is a DNA regulatory region capable of binding RNApolymerase in a cell and initiating transcription of a downstream (3′direction) coding sequence. As used herein, the promoter sequence isbounded at its 3′ terminus by the transcription initiation site andextends upstream (5′ direction) to include the minimum number of basesor elements necessary to initiate transcription at levels detectableabove background. A transcription initiation site (conveniently definedby mapping with nuclease S1) can be found within a promoter sequence, aswell as protein binding domains (consensus sequences) responsible forthe binding of RNA polymerase. Eukaryotic promoters can often, but notalways, contain “TATA” boxes and “CAT” boxes. Prokaryotic promoterscontain Shine-Dalgarno sequences in addition to the −10 and −35consensus sequences.

A large number of promoters, including constitutive, inducible andrepressible promoters, from a variety of different sources are wellknown in the art. Representative sources include for example, viral,mammalian, insect, plant, yeast, and bacterial cell types), and suitablepromoters from these sources are readily available, or can be madesynthetically, based on sequences publicly available on line or, forexample, from depositories such as the ATCC as well as other commercialor individual sources. Promoters can be unidirectional (i.e., initiatetranscription in one direction) or bi-directional (i.e., initiatetranscription in either a 3′ or 5′ direction). Non-limiting examples ofpromoters include, for example, the T7 bacterial expression system, pBAD(araA) bacterial expression system, the cytomegalovirus (CMV) promoter,the SV40 promoter, the RSV promoter. Inducible promoters include the Tetsystem, (U.S. Pat. Nos. 5,464,758 and 5,814,618), the Ecdysone induciblesystem (No et al., Proc. Natl. Acad. Sci. (1996) 93 (8): 3346-3351; theT-RE_(x)™ system (Invitrogen Carlsbad, Calif.), LacSwitch® (Stratagene,(San Diego, Calif.) and the Cre-ER^(T) tamoxifen inducible recombinasesystem (Indra et al. Nuc. Acid. Res. (1999) 27 (22): 4324-4327; Nuc.Acid. Res. (2000) 28 (23): e99; U.S. Pat. No. 7,112,715; and Kramer &Fussenegger, Methods Mol. Biol. (2005) 308: 123-144) or any promoterknown in the art suitable for expression in the desired cells.

As used herein, a “minimal promoter” refers to a partial promotersequence which defines the transcription start site but which by itselfis not capable, if at all, of initiating transcription efficiently. Theactivity of such minimal promoters depends on the binding of activatorssuch as a tetracycline-controlled transactivator to operably linkedbinding sites.

The terms “IRES” or “internal ribosome entry site” refer to apolynucleotide element that acts to enhance the translation of a codingsequence encoded with a. polycistronic messenger RNA. IRES elements,mediate the initiation of translation by directly recruiting and bindingribosomes to a messenger RNA (mRNA) molecule, bypassing the 7-methylguanosine-cap involved in typical ribosome scanning. The presence of anIRES sequence can increase the level of cap-independent translation of adesired protein. Early publications descriptively refer to IRESsequences as “translation enhancers.” For example, cardioviral RNA“translation enhancers” are described in U.S. Pat. No. 4,937,190 toPalmenberg, et al. and U.S. Pat. No. 5,770,428 to Boris-Lawrie.

The terms “nuclear localization signal” and “NLS” refer to a domain, ordomains capable of mediating the nuclear import of a protein orpolynucleotide, or retention thereof, within the nucleus of a cell. A“strong nuclear import signal” represents a domain or domains capable ofmediating greater than 90% subcellular localization in the nucleus whenoperatively linked to a protein of interest. Representative examples ofNLSs include but are not limited to, monopartite nuclear localizationsignals, bipartite nuclear localization signals and N and C-terminalmotifs. N terminal basic domains usually conform to the consensussequence K-K/R-X-K/R which was first discovered in the SV40 large Tantigen and which represents a monopartite NLS. One non-limiting exampleof an N-terminal basic domain NLS is PKKKRKV (SEQ ID NO: 340). Alsoknown are bipartite nuclear localization signals which contain twoclusters of basic amino acids separated by a spacer of about 10 aminoacids, as exemplified by the NLS from nucleoplasmin: KR[PAATKKAGQA]KKKK(SEQ ID NO: 366). N and C-terminal motifs include, for example, theacidic M9 domain of hnRNP A1, the sequence KIPIK (SEQ ID NO: 381) inyeast transcription repressor Matα2 and the complex signals of U snRNPs.Most of these NLSs appear to be recognized directly by specificreceptors of the importin β family.

The term “enhancer” as used herein, refers to a DNA sequence thatincreases transcription of, for example, a gene or coding sequence towhich it is operably linked. Enhancers can be located many kilobasesaway from the coding sequence and can mediate the binding of regulatoryfactors, patterns of DNA methylation or changes in DNA structure. Alarge number of enhancers, from a variety of different sources are wellknown in the art and available as or within cloned polynucleotides(from, e.g., depositories such as the ATCC as well as other commercialor individual sources). A number of polynucleotides comprising promoters(such as the commonly-used CMV promoter) also comprise enhancersequences. Operably linked enhancers can be located upstream, within, ordownstream of coding sequences. The term “Ig enhancers” refers toenhancer elements derived from enhancer regions mapped within the Iglocus (such enhancers include for example, the heavy chain (mu) 5′enhancers, light chain (kappa) 5′ enhancers, kappa and mu intronicenhancers, and 3′ enhancers, (see, e.g., Paul W E (ed) FundamentalImmunology, 3^(rd) Edition, Raven Press, New York (1993) pages 353-363;U.S. Pat. No. 5,885,827).

“Terminator sequences”” are those that result in termination oftranscription. Termination sequences are known in the art and include,but are not limited to, poly A (e.g., Bgh Poly A and SV40 Poly A)terminators. A transcriptional termination signal will typically includea region of 3′ untranslated region (or “3′ ut”), an optional intron(also referred to as intervening sequence or “IVS”) and one or more polyadenylation signals (“p(A)” or “pA.” Terminator sequences may also bereferred to as “IVS-pA,” “IVS+p(A),” “3′ ut+p(A)” or “3′ ut/p(A).”Natural or synthetic terminators can be used as a terminator region.

The terms “polyadenylation,” “polyadenylation sequence,”“polyadenylation signal,” “Poly A,” “p(A)” or “pA” refer to a nucleicacid sequence present in a RNA transcript that allows for thetranscript, when in the presence of the polyadenyl transferase enzyme,to be polyadenylated. Many polyadenylation signals are known in the art.Non-limiting examples include the human variant growth hormonepolyadenylation signal, the SV40 late polyadenylation signal and thebovine growth hormone polyadenylation signal.

The term “splice site” as used herein refers to polynucleotides that arecapable of being recognized by the spicing machinery of a eukaryoticcell as suitable for being cut and/or ligated to a corresponding splicesite. Splice sites allow for the excision of introns present in apre-mRNA transcript. In one example, the 5′ portion of the splice siteis referred to as the splice donor and the 3′ corresponding splice siteis referred to as the acceptor splice site. The term splice siteincludes, for example, naturally occurring splice sites, engineeredsplice sites, for example, synthetic splice sites, canonical orconsensus splice sites, and/or non-canonical splice sites, for example,cryptic splice sites.

A “signal sequence” can be included before the coding sequence. Thissequence encodes a signal peptide, N-terminal to the polypeptide, thatcommunicates to the host cell to direct the polypeptide to the cellsurface or secrete the polypeptide into the media, and this signalpeptide is clipped off by the host cell before the protein leaves thecell. Signal sequences can be found associated with a variety ofproteins native to prokaryotes and eukaryotes.

“Post-translational modification” can encompass any one of orcombination of modification(s), including covalent modification(s),which a protein undergoes after translation is complete and after beingreleased from the ribosome or on the nascent polypeptideco-translationally. Posttranslational modification includes but is notlimited to phosphorylation, myristylation, ubiquitination,glycosylation, coenzyme attachment, methylation, S-nitrosylation andacetylation. Posttranslational modification can modulate or influencethe activity of a protein, its intracellular or extracellulardestination, its stability or half-life, and/or its recognition byligands, receptors or other proteins. Post-translational modificationcan occur in cell organelles, in the nucleus or cytoplasm orextracellularly.

The term “primer” as used herein refers to an oligonucleotide, whetheroccurring naturally as in a purified restriction digest or producedsynthetically, which is capable of acting as a point of initiation ofsynthesis when placed under conditions in which synthesis of a primerextension product, which is complementary to a nucleic acid strand; isinduced, i.e., in the presence of nucleotides and an inducing agent suchas a DNA polymerase and at a suitable temperature and pH. The primer canbe either single-stranded or double-stranded and must be sufficientlylong to prime the synthesis of the desired extension product in thepresence of the inducing agent. The exact length of the primer candepend upon many factors, including temperature, source of primer anduse of the method. For example, for diagnostic applications, dependingon the complexity of the target sequence, an oligonucleotide primer cancontain about 15 to about 25 or more nucleotides, although it cancontain fewer nucleotides.

The primers herein are selected to be “substantially” complementary todifferent strands of a particular target polynucleotide sequence. Thismeans that the primers must be sufficiently complementary to hybridizewith their respective strands. Therefore, the primer sequence need notreflect the exact sequence of the template. For example, anon-complementary nucleotide fragment can be attached to the 5′ end ofthe primer, with the remainder of the primer sequence beingcomplementary to the strand. Alternatively, non-complementary bases orlonger sequences can be interspersed into the primer, provided that theprimer sequence has sufficient complementarity with the sequence of thestrand to hybridize therewith and thereby form the template for thesynthesis of the extension product.

As used herein, the terms “restriction endonucleases” and “restrictionenzymes” refer to bacterial enzymes, each of which cut double-strandedDNA at or near a specific nucleotide sequence.

The term “multiple cloning site” as used herein, refers to a segment ofa vector polynucleotide which can recognize several differentrestriction enzymes.

A “replicon” is any genetic element (e.g., plasmid, episome, chromosome,YAC, virus) that functions as an autonomous unit of DNA replication invivo; i.e., capable of replication under its own control, and containingautonomous replicating sequences.

A “vector” or “cloning vector” is a replicon, such as plasmid, phage orcosmid, to which another polynucleotide segment can be introduced so asto bring about the replication of the inserted segment. Vectors canexist as circular, double stranded DNA, and range in size form a fewkilobases (kb) to hundreds of kb. Preferred cloning vectors have beenmodified from naturally occurring plasmids to facilitate the cloning andrecombinant manipulation of polynucleotide sequences. Many such vectorsare well known in the art; see for example, by Sambrook (In. “MolecularCloning: A Laboratory Manual,” second edition, edited by Sambrook,Fritsch, & Maniatis, Cold Spring Harbor Laboratory, (1989)), Maniatis,In: Cell Biology: A Comprehensive Treatise, Vol. 3, Gene SequenceExpression, Academic Press, NY, pp. 563-608 (1980).

The term “expression vector” as used herein, refers to a vector used forexpressing certain polynucleotides within a host cell or in-vitroexpression system. The term includes plasmids, episomes, cosmidsretroviruses or phages; the expression vector can be used to express aDNA sequence encoding a desired protein and in one aspect includes atranscriptional unit comprising an assembly of expression controlsequences. The choice of promoter and other regulatory elements can varyaccording to the intended host cell, or in-vitro expression system.

As used herein, a “recombination system” refers to one which allows forrecombination between a vector of the present application and achromosome for incorporation of a gene of interest. Recombinationsystems are known in the art and include Cre/Lox systems and FLP-INsystems.

As used herein an “in vitro expression system” refers to cell freesystems that enable the transcription, or coupled transcription andtranslation of DNA templates. Such systems include for example theclassical rabbit reticulocyte system, as well as novel cell freesynthesis systems, (J. Biotechnol. (2004) 110 (3): 257-63; BiotechnolAnnu. Rev. (2004) 10:1-30).

As used herein, a “Cre/Lox” system refers to one such as described byAbremski et al., Cell, 32: 1301-1311 (1983) for a site-specificrecombination system of bacteriophage P1. Methods of using Cre-Loxsystems are known in the art; see, for example, U.S. Pat. No. 4,959,317,which is hereby incorporated in its entirety by reference. The systemconsists of a recombination site designated loxP and a recombinasedesignated Cre. In methods for producing site-specific recombination ofDNA in eukaryotic cells, DNA sequences having first and second lox sitescan be introduced into eukaryotic cells and contacted with Cre, therebyproducing recombination at the lox sites.

As used here, “FLP-IN” recombination refers to systems in which apolynucleotide activation/inactivation and site-specific integrationsystem has been developed for mammalian cells. The system is based onthe recombination of transfected sequences by FLP, a recombinase derivedfrom Saccharomyces. In several cell lines, FLP has been shown to rapidlyand precisely recombine copies of its specific target sequence. FLP-1Nsystems have been described in, for example, U.S. Pat. Nos. 5,654,182and 5,677,177).

The term “transfection,” “transformation,” or “transduction” as usedherein, refers to the introduction of one or more exogenouspolynucleotides into a host cell by using one or physical or chemicalmethods. Many transfection techniques are known to those of ordinaryskill in the art including but not limited to calcium phosphate DNAco-precipitation (see Methods in Molecular Biology, Vol. 7, GeneTransfer and Expression Protocols, Ed. E. J. Murray, Humana Press(1991)); DEAE-dextran; electroporation; cationic liposome-mediatedtransfection; tungsten particle-facilitated microparticle bombardment(Johnston, S. A., Nature 346: 776-777 (1990)); and strontium phosphateDNA co-precipitation (Brash D. E. et al. Molec. Cell. Biol. 7: 2031-2034(1987). Phage, or retroviral vectors can be introduced into host cells,after growth of infectious particles in packaging cells that arecommercially available.

The terms “cells,” “cell cultures,” “cell line,” “recombinant hostcells,” “recipient cells” and “host cells” are often usedinterchangeably and will be clear from the context in which they areused. These terms include the primary subject cells and any progenythereof, without regard to the number of transfers. It should beunderstood that not all progeny are exactly identical to the parentalcell (due to deliberate or inadvertent mutations or differences inenvironment), however, such altered progeny are included in these terms,so long as the progeny retain the same functionality as that of theoriginally transformed cell. For example, though not limited to, such acharacteristic might be the ability to produce a particular recombinantprotein. A “mutator positive cell line” is a cell line containingcellular factors that are sufficient to work in combination with othervector elements to affect hypermutation. The cell line can be any ofthose known in the art or described herein. A “clone” is a population ofcells derived from a single cell or common ancestor by mitosis.

A “reporter gene” refers to a polynucleotide that confers the ability tobe specifically detected (or detected and selected), when expressed witha cell of interest. Numerous reporter gene systems are known in the artand include, for example, alkaline phosphatase (Berger, J., et al., Gene66: 1-10 (1988); Kain, S R., Methods Mol. Biol. 63: 49-60 (1997)),beta-galactosidase (U.S. Pat. No. 5,070,012), chloramphenicolacetyltransferase (Gorman et al., Mol. Cell. Biol. 2: 1044-51 (1982)),beta glucuronidase, peroxidase, beta lactamase (U.S. Pat. Nos.5,741,657, 5,955,604), catalytic antibodies, luciferases (U.S. Pat. Nos.5,221,623; 5,683,888; 5,674,713; 5,650,289; 5,843,746) and naturallyfluorescent proteins (Tsien, R Y, Annu. Rev. Biochem. 67 509-544(1998)). The term “reporter gene,” also includes any peptide which canbe specifically detected based on the use of one or more, antibodies,epitopes, binding partners, substrates, modifying enzymes, receptors, orligands that are capable of, or desired to (or desired not to), interactwith the peptide of interest to create a detectable signal. Reportergenes also include genes that can modulate cellular phenotype.

The term “selectable marker gene” as used herein, refers topolynucleotides that allow cells carrying the polynucleotide to bespecifically selected for or against, in the presence of a correspondingselective agent. Selectable markers can be positive, negative orbifunctional. Positive selectable markers allow selection for cellscarrying the marker, whereas negative selectable markers allow cellscarrying the marker to be selectively eliminated. The selectable markerpolynucleotide can either be directly linked to the polynucleotides tobe expressed, or introduced into the same cell by co-transfection. Avariety of such marker polynucleotides have been described, includingbifunctional (i.e., positive/negative) markers (see, e.g., WO 92/08796,published May 29, 1992, and WO 94/28143, published Dec. 8, 1994), herebyincorporated in their entirety by reference herein. Specific examples ofselectable markers of drug-resistance genes include, but are not limitedto, ampicillin, tetracycline, blasticidin, puromycin, hygromycin,ouabain or kanamycin. Specific examples of selectable markers are those,for example, that encode proteins that confer resistance to cytostaticor cytocidal drugs, such as the DHFR protein, which confers resistanceto methotrexate (Wigler et al., Proc. Natl. Acad. Sci. USA, 77:3567(1980); O'Hare et al., Proc. Natl. Acad. Sci. USA, 78:1527 (1981)); theGPF protein, which confers resistance to mycophenolic acid (Mulligan &Berg, Proc. Natl. Acad. Sci. USA, 78:2072 (1981)), the neomycinresistance marker, which confers resistance to the aminoglycoside G-418(Colberre-Garapin et al., J. Mol. Biol., 150:1 (1981)); the hygromycinprotein, which confers resistance to hygromycin (Santerre et al., Gene,30:147 (1984)); murine Na+, K+-ATPase alpha subunit, which confersresistance to ouabain (Kent et al., Science, 237:901-903 (1987); and theZeocin™ resistance marker (available commercially from Invitrogen). Inaddition, the herpes simplex virus thymidine kinase (Wigler et al.,Cell, 11:223 (1977)), hypoxanthine-guanine phosphoribosyltransferase(Szybalska & Szybalski, Proc. Natl. Acad. Sci. USA, 48:2026 (1962)), andadenine phosphoribosyltransferase (Lowy et al., Cell, 22:817 (1980)) canbe employed in tk⁻, hgprt⁻ or aprt⁻ cells, respectively. Glutaminesynthetase permits the growth of cells in glutamine(GS)-free media (see,e.g., U.S. Pat. Nos. 5,122,464; 5,770,359; and 5,827,739). Otherselectable markers encode, for example, puromycin N-acetyl transferaseor adenosine deaminase.

“Homology” or “identity” or “similarity” refers to sequence similaritybetween two peptides or between two nucleic acid molecules. Homology andidentity can each be determined by comparing a position in each sequencewhich can be aligned for purposes of comparison. When an equivalentposition in the compared sequences is occupied by the same base or aminoacid, then the molecules are identical at that position; when theequivalent site occupied by the same or a similar amino acid residue(e.g., similar in steric and/or electronic nature), then the moleculescan be referred to as homologous (similar) at that position. Expressionas a percentage of homology/similarity or identity refers to a functionof the number of identical or similar amino acids at positions shared bythe compared sequences. A sequence which is “unrelated” or“non-homologous” shares less than 40% identity, less than 35% identity,less than 30% identity, or less than 25% identity with a sequence of thepresent invention. In comparing two sequences, the absence of residues(amino acids or nucleic acids) or presence of extra residues alsodecreases the identity and homology/similarity.

The term “homology” describes a mathematically based comparison ofsequence similarities which is used to identify genes or proteins withsimilar functions or motifs. The nucleic acid and protein sequences ofthe present invention can be used as a “query sequence” to perform asearch against public databases to, for example, identify other familymembers, related sequences or homologs. Such searches can be performedusing the NBLAST and) (BLAST programs (version 2.0) of Altschul, et al.(1990) J. Mol. Biol. 215:403-10. BLAST nucleotide searches can beperformed with the NBLAST program, score=100, wordlength=12 to obtainnucleotide sequences homologous to nucleic acid molecules of theinvention. BLAST protein searches can be performed with the XBLASTprogram, score=50, wordlength=3 to obtain amino acid sequenceshomologous to protein molecules of the invention. To obtain gappedalignments for comparison purposes, Gapped BLAST can be utilized asdescribed in Altschul et al., (1997) Nucleic Acids Res.25(17):3389-3402. When utilizing BLAST and Gapped BLAST programs, thedefault parameters of the respective programs (e.g., XBLAST and BLAST)can be used (See www.ncbi.nhn.nih.gov).

As used herein, “identity” means the percentage of identical nucleotideor amino acid residues at corresponding positions in two or moresequences when the sequences are aligned to maximize sequence matching,i.e., taking into account gaps and insertions. Identity can be readilycalculated by known methods, including but not limited to thosedescribed in (Computational Molecular Biology, Lesk, A. M., ed., OxfordUniversity Press, New York, 1988; Biocomputing: Informatics and GenomeProjects, Smith, D. W., ed., Academic Press, New York, 1993; ComputerAnalysis of Sequence Data, Part I, Griffin, A. M., and Griffin, H. G.,eds., Humana Press, New Jersey, 1994; Sequence Analysis in MolecularBiology, von Heinje, G., Academic Press, 1987; and Sequence AnalysisPrimer, Gribskov, M. and Devereux, J., eds., M Stockton Press, New York,1991; and Carillo, H., and Lipman, D., SIAM J. Applied Math., 48: 1073(1988). Methods to determine identity are designed to give the largestmatch between the sequences tested. Moreover, methods to determineidentity are codified in publicly available computer programs. Computerprogram methods to determine identity between two sequences include, butare not limited to, the GCG program package (Devereux, J., et al.,Nucleic Acids Research 12(1): 387 (1984)), BLASTP, BLASTN, and FASTA(Altschul, S. F. et al., J. Molec. Biol. 215: 403-410 (1990) andAltschul et al. Nuc. Acids Res. 25: 3389-3402 (1997)). The BLAST Xprogram is publicly available from NCBI and other sources (BLAST Manual,Altschul, S., et al., NCBI NLM NIH Bethesda, Md. 20894; Altschul, S., etal., J. Mol. Biol. 215: 403-410 (1990). The well known Smith Watermanalgorithm can also be used to determine identity.

A “heterologous” region of a DNA sequence is an identifiable segment ofDNA within a larger DNA sequence that is not found in association withthe larger sequence in nature. Thus, when the heterologous regionencodes a mammalian gene, the gene can usually be flanked by DNA thatdoes not flank the mammalian genomic DNA in the genome of the sourceorganism. Another example of a heterologous coding sequence is asequence where the coding sequence itself is not found in nature (e.g.,a cDNA where the genomic coding sequence contains introns or syntheticsequences having codons or motifs different than the unmodified gene).Allelic variations or naturally-occurring mutational events do not giverise to a heterologous region of DNA as defined herein.

The term “activation-induced cytidine deaminase” or (“AID”) refers tomembers of the AID/APOBEC family of RNA/DNA editing cytidine deaminasescapable of mediating the deamination of cytosine to uracil within a DNAsequence. (See, e.g., Conticello et al., Mol. Biol. Evol. 22 No 2367-377 (2005), “Evolution of the AID/APOBEC Family of Polynucleotide(Deoxy)cytidine Deaminases” and U.S. Pat. No. 6,815,194). Suitable AIDenzymes include all vertebrate forms of the enzyme, including, forexample, primate, rodent, avian and bony fish. Representative examplesof AID enzymes include without limitation, human (GenBank Accession No.NP_(—)065712), rat, chicken, canine and mouse (GenBank Accession No.NP_(—)033775) forms. The term “AID homolog” refers to the enzymes of theApobec family and include, for example, Apobec and, in particular, canbe selected from Apobec family members such as Apobec-1, Apobec3C orApobec3G (described, for example, by Jarmuz et al., (2002) Genomics, 79:285-296) (2002)). AID and AID homologs also include, without limitation,modified polypeptides (e.g. mutants or muteins) that retain the abilityto deaminate a polynucleotide sequence. The term “AID activity” includesactivity mediated by AID and AID homologs.

The term “transition mutations” refers to base changes in a DNA sequencein which a pyrimidine (cytidine (C) or thymidine (T) is replaced byanother pyrimidine, or a purine (adenosine (A) or guanosine (G) isreplaced by another purine.

The term “transversion mutations” refers to base changes in a DNAsequence in which a pyrimidine (cytidine (C) or thymidine (T) isreplaced by a purine (adenosine (A) or guanosine (G), or a purine isreplaced by a pyrimidine.

The term “base excision repair” refers to a DNA repair pathway thatremoves single bases from DNA such as uridine nucleotides arising bydeamination of cytidine. Repair is initiated by uracil glycosylase thatrecognizes and removes uracil from single- or double-stranded DNA toleave an abasic site.

The term “mismatch repair” refers to the repair pathway that recognizesand corrects mismatched bases, such as those that arise from errors ofchromosomal DNA replication.

The term “pol eta” (also called PolH, RAD30A, XPV, XP-V) refers to alow-fidelity DNA polymerase that plays a role in replication throughlesions, for instance, replication through UV-induced thymidine dimers.The gene for pol eta is defective in Xeroderma pigmentosum variant typeprotein, XPV. On non-damaged DNA, pol eta misincorporates incorrectnucleotides at a rate of approximately 3 per 100 bp, and is especiallyerror-prone when replicating through templates containing WAdinucleotides (W=A or T) (Gearhart and Wood, 2001). Pol eta has beenshown to play an important role as an A/T mutator during SHM inimmunoglobulin variable genes (Zeng et al., 2001). Representativeexamples of pol eta include without limitation, human (GenBank AccessionNo. BAA81666), rat (GenBank Accession No. XP_(—)001066743), chicken(GenBank Accession No. NP_(—)001001304), canine (GenBank Accession No.XP_(—)532150) and mouse (GenBank Accession No. NP_(—)109640) forms.

The term “pol theta” (also called PolQ) refers to a low-fidelity DNApolymerase that may play a role in crosslink repair (Gearhart and Wood,Nature Rev Immunol 1: 187-192 (2001)) and contains an intrinsicATPase-helicase domain (Kawamura et al., Int. J. Cancer 109(1):9-16(2004)). The polymerase is able to efficiently replicate through anabasic site by functioning both as a mispair inserter and as a mispairextender (Zan et al., EMBO Journal 24, 3757-3769 (2005)). Representativeexamples of pol theta include without limitation, human (GenBankAccession No. NP_(—)955452), rat (GenBank Accession No. XP_(—)221423),chicken (GenBank Accession No. XP_(—)416549), canine (GenBank AccessionNo. XP_(—)545125) and mouse (GenBank Accession No. NP_(—)084253) forms.Pol ete and Pol theta are sometimes referred to collectively as “errorprone polymerases.”

As used herein, the term “SHM hot spot” or “hot spot” or “hot spotmotif” refers to a polynucleotide sequence or motif of 3-6 nucleotides(i.e., 1-2 codons) that exhibits an increased tendency to undergo SHM,as determined via a statistical analysis of SHM mutations in antibodygenes (see Tables 2 and 3 which provide a relative ranking of variousmotifs for SHM, and Table 7 which lists canonical hot spots and coldspots). The statistical analysis can be extrapolated to analysis of SHMmutations in non-antibody genes as described elsewhere herein. For thepurposes of graphical representations of hot spots in Figures, the firstnucleotide of a canonical hot spot is represented by the letter “H.”

Likewise, as used herein, a “SHM coldspot” or “cold spot” or “cold spotmotif” refers to a polynucleotide or motif of 3-6 nucleotides (i.e., 1-2codons) that exhibits a decreased tendency to undergo SHM, as determinedvia a statistical analysis of SHM mutations in antibody genes (seeTables 2 and 3 which provide a relative ranking of various motifs forSHM, and Table 7 which lists canonical hot spots and cold spots). Thestatistical analysis can be extrapolated to analysis of SHM mutations innon-antibody genes as described elsewhere herein. For the purposes ofgraphical representations of cold spots in Figures, the first nucleotideof a canonical cold spot is represented by the letter “C.”

The term “somatic hypermutation motif” or “SHM motif” refers to apolynucleotide sequence that includes, or can be altered to include, oneor more hot spots and/or cold spots, and which encodes a defined set ofamino acids. SHM motifs can be of any size, but are conveniently basedaround polynucleotides of about 2 to about 20 nucleotides in size,preferred SHM motifs range from about 3 to about 9 nucleotides in size.SHM motifs can include any combination of canonical hot spots and/orcold spots, or may lack both canonical hot spots and/or cold spots.

The term “preferred SHM motif” refers to one or more preferred SHMcodons (see Table 7 infra).

The terms “preferred hot spot SHM codon,” “preferred hot spot SHMmotif,” “preferred SHM hot spot codon” and “preferred SHM hot spotmotif,” all refer to a codon including, but not limited to codons AAC,TAC, TAT, AGT and AGC. Such sequences may be potentially embedded withinthe context of a larger SHM motif, recruit SHM mediated mutagenesis andgenerate targeted amino acid diversity at that codon.

A polynucleotide sequence has been “optimized for SHM” if thepolynucleotide, or a portion thereof has been altered to increase ordecrease the frequency and/or location of hot spots and/or cold spotswithin the polynucleotide. A polynucleotide that has been made“susceptible to SHM” if the polynucleotide, or a portion thereof, hasbeen altered to increase the frequency (density) and/or location of hotspots within the polynucleotide or to decrease the frequency and/orlocation of cold spots within the polynucleotide. Conversely, apolynucleotide that has been made “resistant to SHM” if thepolynucleotide, or a portion thereof, has been altered to decrease thefrequency and/or location of hot spots and/or has been altered toincrease the frequency (density) and/or location of cold spots withinthe polynucleotide. In one embodiment, a sequence can be prepared thathas a greater or lesser susceptibility (or rate) to undergo SHM-mediatedmutagenesis by altering the codon usage, and/or the amino acids encodedby polynucleotide sequence relative to the unmodified polynucleotide.

Optimization of a polynucleotide sequence refers to modifying about 1%,about 2%, about 3%, about 4%, about 5%, about 10%, about 20%, about 25%,about 50%, about 75%, about 90%, about 95%, about 96%, about 97%, about98%, about 99%, 100% or any range therein of the nucleotides in thepolynucleotide sequence. Optimization of a polynucleotide sequence alsorefers to modifying about 1, about 2, about 3, about 4, about 5, about10, about 20, about 25, about 50, about 75, about 90, about 95, about96, about 97, about 98, about 99, about 100, about 200, about 300, about400, about 500, about 750, about 1000, about 1500, about 2000, about2500, about 3000 or more, or any range therein of the nucleotides in thepolynucleotide sequence such that some or all of the nucleotides areoptimized for SHM-mediated mutagenesis. Reduction in the frequency(density) of hot spots and/or cold spots refers to reducing about 1%,about 2%, about 3%, about 4%, about 5%, about 10%, about 20%, about 25%,about 50%, about 75%, about 90%, about 95%, about 96%, about 97%, about98%, about 99%, 100% or any range therein of the hot spots or cold spotsin a polynucleotide sequence. Increasing the frequency (density) of hotspots and/or cold spots refers to increasing about 1%, about 2%, about3%, about 4%, about 5%, about 10%, about 20%, about 25%, about 50%,about 75%, about 90%, about 95%, about 96%, about 97%, about 98%, about99%, 100% or any range therein of the hot spots or cold spots in apolynucleotide sequence.

The position or reading frame of a hot spot or cold spot is also afactor governing whether SHM mediated mutagenesis that can result in amutation that is silent with regards to the resulting amino acidsequence, or causes conservative, semi-conservative or non conservativechanges at the amino acid level. As discussed below, these designparameters can be manipulated to further enhance the relativesusceptibility or resistance of a nucleotide sequence to SHM. Thus boththe degree of SHM recruitment and the reading frame of the motif areconsidered in the design of SHM susceptible and SHM resistantpolynucleotide sequences.

As used herein, “somatic hypermutation” or “SHM” refers to the mutationof a polynucleotide initiated by, or associated with the action ofactivation-induced cytidine deaminase, uracil glycosylase and/or errorprone polymerases on that polynucleotide sequence. The term is intendedto include mutagenesis that occurs as a consequence of the error pronerepair of the initial lesion, including mutagenesis mediated by themismatch repair machinery and related enzymes.

As used herein, the term “UDG” refers to uracil DNA glycosylase, one ofseveral DNA glycosylases that recognize different damaged DNA bases andremove them before replication of the genome. DNA glycosylases canremove DNA bases that are cytotoxic or cause DNA polymerase to introduceerrors, and are part of the base excision repair pathway for DNA. UracilDNA glycosylase recognizes uracil in DNA, a product of cytidinedeamination, leading to its removal and potential replacement with a newbase.

The term “isolated” refers to the state in which polypeptides orpolynucleotides of the invention will be in accordance with the presentinvention. Polypeptides or polynucleotides will be free or substantiallyfree of material with which they are naturally associated such as otherpolypeptides or polynucleotides with which they are found in theirnatural environment, or the environment in which they are prepared (e.g.cell culture) when such preparation is by recombinant DNA technologypracticed in vitro or in vivo. Polypeptides or polynucleotides can beformulated with diluents or adjuvants and still for practical purposesbe isolated—for example, the polypeptides or polynucleotides can bemixed with gelatin or other carriers if used to coat microtitre platesfor use in immunoassays, or can be mixed with pharmaceuticallyacceptable carriers or diluents when used in diagnosis or therapy.Polypeptides or polynucleotides can be glycosylated, either naturally orby systems of heterologous eukaryotic cells, or they can be (forexample, if produced by expression in a prokaryotic cell)unglycosylated.

The term “selection” refers to the separation of one or more members,such as polynucleotides, proteins or cells from a library of suchmembers. Selection can involve both detection and selection, for examplewhere cells are selected by use of a fluorescence activated cell sorter(FACS) that detects a reporter gene and then sorts the cellsaccordingly.

As used herein, “pg” means picogram, “ng” means nanogram, “ug” or “μg”mean microgram, “mg” means milligram, “g” means gram “ul” or “μl” meanmicroliter, “ml” means milliliter, “l” means liter, “kb” meanskilobases, “nM” means nanomolar, “pM” means picomolar, “fM” meansfemtomolar, and “M” means molar.

The phrase “pharmaceutically acceptable” refers to molecular entitiesand compositions that are physiologically tolerable and do not producean unsafe reaction.

II. Somatic Hypermutation

During the generation of antibodies, in vivo point mutations occurwithin the variable region (V-region) coding sequence of the antigenreceptor, and the rate of mutation observed, called SHM, is about amillion times greater than the spontaneous mutation rate in other genes.Bachl et al., Increased transcription levels induce higher mutationrates in a hypermutating cell line. J. Immunol. 2001 Apr. 15;166(8):5051-7; Martin and Scharff, AID and mismatch repair in antibodydiversification. Nat Rev Immunol. 2002; 2(8):605-14.

In humans and mice, after the primary repertoire of antibody specificityis Created by V-D-J rearrangement, and following antigen encounter, therearranged V genes in those B cells that have been triggered by theantigen are subjected to two further types of genetic modification: SHMwhich triggers the diversification of the variable region ofimmunoglobulin genes, generating the secondary repertoire therebyallowing affinity maturation of the humoral response, and Class SwitchRecombination (CSR) which involves the specific non-homologousrecombination process which leads to isotype changes in the constantregion of the expressed antibody. In chickens and rabbits (but not manor mouse) an additional mechanism, gene conversion, is a majorcontributor to V gene diversity.

AID is expressed within activated B cells and is an essential proteinfactor for SHM, CSR and gene conversion (Muramatsu et al., 2000; Revy etal., 2000). AID belongs to a family of enzymes, the APOBEC family, whichshare certain features with the metabolic cytidine deaminases butdiffers from them in that AID deaminates nucleotides within singlestranded polynucleotides, and cannot utilize free nucleotide as asubstrate. Other enzymes of the AID/APOBEC family can also act todeaminate cytidine on single stranded RNA or DNA (Conticello et al.,(2005)).

The human AID protein comprises 198 amino acids and has a predictedmolecular weight of 24 kDa. The human AID gene is located at locus12p13, close to APOBEC-1. The AID protein has a cytidine/deoxycytidinedeaminase motif, is dependent on zinc, and can be inhibited bytetrahydrouridine (THU) which is a specific inhibitor of cytidinedeaminases.

Even prior to the discovery of unmodified AID, it was noted that SHMoccurs more frequently in cytidines that are within the context of WRCY(AT/GA/C/AT) motifs. There is now accumulating evidence that this motiffor SHM likely repreents a composite of this hot spot motif for AIDdeamination and for initiating error prone repair by the DNA polymerasespol eta and pol theta (Rogozin et al. (2004); Zan et al. (2005)).

High levels of DNA transcription have been shown necessary but alone arenot sufficient for AID-mediated mutagenesis. In vivo, SHM begins about80 to about 100 nucleotides from the transcription start site, butdecreases in frequency as a function of distance from the promoter.Native AID has been shown in vitro to interact directly with thetranscriptional elongation complex, but not the transcriptionalinitiation complex, and this interaction may be dependent upon thedissociation of the initiation factors, that occurs as thetranscriptional initiation complex converts to the fully processive,elongation-competent transcription elongation complex (Besmer et al.,2006).

Since AID is only able to deaminate cytidines on single stranded DNA, itis likely that the requirement for transcription reflects the generationof single stranded regions by transcription bubbles. Studies withpurified unmodified AID in vitro, however, suggest that AID binding issequence independent, potentially allowing a scanning mode for hot spotcapture that is driven by active transcription of the gene. In vitrostudies suggest that unmodified AID has an apparent Kd for singlestranded DNA in the range of 0.3 to 2 nM, and that the complex has ahalf life of 4-8 minutes. The turnover number of purified unmodified AIDon single stranded DNA is approximately one deamination every 4 minutes,(Larijani et al., (2006)).

AID acts on DNA to deaminate cytidine residues to uracil residues oneither strand of the transcribed DNA molecule. If the initial (C→U)lesion is not further modified prior to, or during DNA replication, thenan adenosine (A) can be inserted opposite the U nucleotide, ultimatelyresulting in C→T or G→A transition mutations. The significance of thischange at the amino acid level depends upon the location of thenucleotide within the codon within the reading frame. If this mutationoccurs in the first or second position of the codon, the result islikely to be a non conservative amino acid substitution. By contrast, ifthe change occurs at the third position of the codon reading frame,within the wobble position, the practical effect of the mutation at theamino level will be slight because the effect of the nucleotide changewill be silent or result in a conservative amino acid substitution.

Alternatively, the C→U lesion, and potentially the neighboring bases canbe acted upon by DNA repair machinery, which in SHM, leads to repair inan error prone fashion. Studies in knock out mice have established thatbase excision repair via uracil DNA glycosylase (UDG), plays a role inmediating the mutation of A and T residues close to hot spot motifs;(Shen et al (2006)). Additionally there is increasing evidence that thecreation of abasic sites by UDG recruits error prone polymerases, suchas pol eta and pol theta, and that these polymerases introduceadditional mutations at all base positions in the surrounding sequence(Watanabe et al. (2004); Neuberger et al (2005)). It is believed thatpol eta is central to the creation of A mutations during SHM and isparticularly error prone for coding strand adenosines proceeded by A orT (W/A) that are preferentially mutated to G.

It has been observed that in antibody genes, codon usage and preciseconcomitant hot spot/cold spot targeting of AID activity and pol etaerrors in the CDRs and FRs, respectively, has evolved under selectivepressure to maximize mutations in the variable regions and minimizemutations in the framework regions (Zheng et al., JEM 201(9): 1467-1478(2005)). For example, Zheng et al. observed that the precise alignmentof C and G nucleotides within the codons preferentially used within anantibody gene causes most C to T and G to A mutations to be silent orconservative. Juxaposed on the precise placement of Cs and Gs, Zheng etal., also observed the preferential placement of As and Ts in hot spotsof pol eta in the variable regions and the exclusion from these sites inthe framework regions.

The regulation of SHM in vivo and the determinants that direct and limitSHM to the Ig locus has been the subject of intense debate andexperimental research. The rate of SHM observed in vivo has been shownto be at least partially dependent upon, for example, the followingfactors: 1) the AID expression levels and AID activity levels within aparticular cell type; (Martin et al. (2002), Rucci et al., (2006)), 2)the degree of AID post translational modification and degree of nuclearlocalization; (McBride et al. (2006), Pasqualucci et al. (2006), Muto etal. (2006)), 3) the presence of immune locus specific enhancer regions,E-box motifs, or associated cis acting binding factors; (Komori et al.(2006), Schoetz, et al. (2006)), 4) the proximity of the targetedsequence to the transcriptional initiation site/promoter region; (Radaet al., (2001)), 5) the rate of transcription of the target sequence;(Storb et al., (2001)),6) the degree of target gene methylation;(Larijani et al (2005)), 7) the genomic context of the target gene, ifintegrated into the cell's genomic DNA; 8) the presence or absence ofauxiliary factors, such as Pol Eta, MSH2; (Shen et al. (2006)), 9) theexistence of hotspot or coldspot sequences within the target sequence;(Zheng et al. (2005)), 10) the existence of inhibitory factors;(Santa-Marta, et al. (2006)), 11) rate of DNA repair within the celltype of interest, (Poltoratsky (2006)), 12) the formation of local DNAor RNA hairpins structures; (Steele et al. (2006)), and 13) thephosphorylation state of histone H2B (Odegard et al. (2005)).

The present invention is based, in part, upon the optimization of someof the factors above to create both a temporally and spatiallycontrolled system for hypermutation.

III. Identification and Analysis of Polynucleotides for SomaticHypermutation

Previous analyses of antibody sequences, see, for example, Zheng et al.J. Exp. Med. (2005); 201(9):1467-1478; and Wang and Wabl, J. Immunol.2005; 174(9):5650-4, have been based primarily on identifying thepolynucleotide sequence motifs involved in SHM rather than elaboratingthe underlying logical operations that connect multiple rounds of SHMduring protein evolution within a polynucleotide sequence.

As a first step to developing such an improved understanding of thecontext of one or more rounds of SHM within the reading frame of apolynucleotide sequence, and the underlying logic of relationshipswithin codon usage patterns, the present applicants have used astatistical approach to identify consensus hot spots and cold spots.Such a statistical approach can also be used to functionally track theconsequences of those mutations at the nucleotide level and tostructural consequences at the amino acid level following SHM by AID.

Starting from any given polynucleotide sequence, this approach can beused to generate polynucleotide sequences that rapidly converge tostructural consequences at the amino acid level a small number ofpossible sequences that are optimized for the properties describedherein.

Polynucleotides sequences of the present application include fullysynthetic polynucleotides, fully synthetic genes, semi-syntheticpolynucletides and semi-synthetic genes. “Semi-synthetic polynucletides”and “semi-synthetic genes” as used herein, refer to a polynucleotidesequences that consists in part of a nucleic acid sequence that has beenobtained via polymerase chain reaction (PCR) or other similiar enzymaticamplification system which utilizes a natural donor (i.e., peripheralblood monocytes) as the starting material for the amplificationreaction. The remaining “synthetic” polynucleotides, i.e., thoseportions of semi-synthetic polynucleotide not obtained via PCR or othersimiliar enzymatic amplification system can be synthesized de-novo usingmethods known in the art including, but not limited to, the chemicalsynthesis of nucleic acid sequences.

In the present application, where the observed number of SHM mutationsat a polynucleotide motif is conditioned on the underlying frequency ofobserving that motif, the degree to which a polynucleotide sequence ormotif is a SHM “hot spot” or “cold spot” was derived from analysis ofSHM mutations identified in approximately 50,000 antibody sequences. Onemeasure of the statistical significance of a SHM “hot” or “cold” motifcompares the number of times a motif is observed (Ns) at the site of aSHM mutation event, with how often it would be expected at random (Nps)(where N equals the total number of observed mutations in the datasetand ps is the background probability of observing the motif). A Markovchain was used to calculate the background frequency of the observingany motif at the site of mutation, as described previously (Tompa,1999), using di-nucleotide transition probabilities taken directly fromhuman germline IGHV sequences, as shown below.

${{ij} = {\lfloor \begin{matrix}0.169 & 0.270 & 0.381 & 0.179 \\0.289 & 0.287 & 0.101 & 0.321 \\0.239 & 0.219 & 0.314 & 0.227 \\0.155 & 0.278 & 0.413 & 0.154\end{matrix} \rfloor \mspace{14mu} {where}\mspace{14mu} i}},{j \in \{ {A,C,T,G} \}}$

The difference in the number of observed:expected motifs occurrences atthe site of mutation is given by Ns−Nps, where √{square root over(Np_(s)(1−p_(s)))} represents the standard deviation of Nps, and thez-score for each motif is given by:

M _(s)=(N _(s) −Np _(s))/√{square root over (Np _(s)(1−p _(s)))}

where Ms is the number of standard deviations by which the observednumber of motif occurrences (at the site of mutation) exceeds theexpected value. This metric can been used to rank all possible SHM “hotspot” and “cold spot” motifs, and to characterize the degree to whichany motif is “hot” or “cold” to SHM mediated mutagenesis. For instance,those 3-mer, 4-mer, 5-mer, or 6-mer polynucleotide motifs havingrank-ordered z-scores in the top 5% or 10% of all equivalent lengthpolynucleotide motifs can be considered SHM “hot spots.” Likewise, those3-mer, 4-mer, 5-mer or 6-mer polynucleotide motifs having rank-orderedz-scores in the bottom 5% or 10% of all equivalent length motifs can beconsidered SHM “cold spots.” In one aspect, the hot spot can be, forexample, a preferred hot spot SHM codon or motif or a more preferred hotspot SHM codon or motif Rank-ordered tables of the top 3-mer, 4-mer and6-mer nucleotide sequences with their corresponding. SHM mutationz-scores, describing their propensity to attract SHM-mediatedmutagenesis (i.e., be more susceptible to SHM), are provided below inTables 2 and 3.

TABLE 2 3- 3-mer 4-mer 4-mer 4-mer 4-mer mer z-score 4-mer z-score 4-merz-score 4-mer z-score 4-mer z-score ATA 271.09 AATA 249.23 TACC 92.73ACGA 19.69 CTGG −55.05 AGC 185.10 AGCA 225.50 GAAA 89.97 TTTT 17.21 CGGA−56.07 TAT 178.79 ATAT 224.06 CTGC 88.23 TTCT 16.95 ACGG −58.65 CAG176.52 AACA 215.78 CCAA 87.55 GATC 16.55 GCCT −61.62 ACA 161.58 ATAA213.14 TATC 86.83 TGTA 15.70 CGCC −62.50 CCA 156.43 ATCA 193.93 CCCA86.81 CCCC 14.29 CTTG −63.02 ATT 128.07 TACA 190.78 GCTA 84.30 TTCC 8.07AGTG −64.08 AAT 123.91 CACA 183.94 CTTA 83.60 CGCA 7.95 GGAC −66.33 CAC113.31 ACAA 182.20 GCAA 83.41 CCTG 6.44 CCCG −68.14 CAT 106.72 ATTA174.57 ATCC 82.88 AAGT 6.21 GTGA −69.31 GCT 99.04 CAGA 172.86 GAAT 82.09GTTA 5.83 TTGT −70.87 TCA 92.35 AACT 171.38 ATTC 80.57 GTAA 5.54 GCGA−71.78 TAC 90.32 AGAT 167.36 AGCC 79.90 GACT 5.46 GTTT −73.35 ACT 84.63ACAG 165.72 CTCA 78.97 TCCT 4.16 GGGA −75.77 ATC 82.30 CAAC 163.72 CCAG78.46 GACC 2.64 CGTA −76.30 AGA 78.69 TATA 159.43 AGTA 78.05 GGAT −0.62TCGA −76.40 CTA 71.32 ATAC 157.31 TAGC 76.80 TCTG −1.62 CGAG −78.05 GCA70.80 ACTA 152.17 ATTT 74.50 GCTG −2.06 AGGG −81.46 GAT 68.06 CAGC148.78 ACTG 74.10 GATG −2.19 GAGT −82.94 CTG 67.83 ACCA 146.54 TCAC71.95 ACCG −2.66 CCGG −85.06 ACC 65.99 AAGC 145.36 CTGA 68.58 TTTC −4.30GAGG −85.74 GAA 59.03 AGAA 144.62 CCTA 67.05 TAGT −4.65 GTTG −86.35 TGA56.50 AAAA 136.44 TCTA 66.67 CGCT −5.54 TCCG −88.86 ATG 52.18 ACAT135.69 AATG 66.07 AGCG −5.58 GTTC −89.62 CAA 48.79 AGCT 134.58 GCAT65.56 CCCT −7.38 CGGC −90.00 AAA 39.39 CAAT 133.12 ACCC 62.47 CCTC −7.50GCGC −91.60 AAC 37.15 GATA 131.74 TCAT 61.22 TGGA −8.79 CTCG −92.05 TTA35.04 ACAC 130.35 TGCT 61.11 CTGT −10.50 TGGC −92.93 TAA 31.78 ATCT128.86 CTAG 59.03 GTAT −10.53 TCGC −96.14 AAG 24.73 CACC 125.86 ACTT58.98 TATG −13.14 TGTG −96.30 CTT 17.61 CATA 125.75 AGAG 58.81 AAGG−13.25 TTGG −100.73 TTC 16.92 ATAG 121.65 TTAC 57.51 CCGC −13.98 GGTT−102.17 GTA 15.61 TAAT 121.29 TTTA 56.94 ATGG −13.99 GCCG −104.21 TAG13.84 CAAA 121.00 TCAG 56.45 CGAA −14.21 CCGT −105.94 GGA 11.44 TATT120.42 ATGC 54.70 TCTT −15.45 GTCT −108.78 TTT 6.80 CTAA 119.93 AGAC53.01 TGAC −16.19 GGCC −110.06 AGT 2.60 CATC 118.61 TGAT 51.51 CCTT−16.61 GACG −112.93 CTC −1.47 TTCA 117.73 GCAC 51.04 CACG −19.16 TGGT−115.42 TCC −5.22 AAAC 116.35 AGGA 50.16 GGCA −21.99 GTGC −117.74 CCT−5.42 TTAT 114.64 TAAG 49.76 TCCC −23.02 TTCG −118.98 CCC −7.09 AAAT114.43 CAGT 49.09 AACG −26.20 ACGT −121.92 GAG −8.26 CCAT 113.51 ACTC46.69 CGAT −27.41 GCGG −124.24 TGC −14.70 ACCT 111.92 AGTT 45.47 AGGT−29.09 TGCG −126.58 TCT −18.88 TAAC 111.26 CAAG 43.20 TCTC −29.53 TGGG−127.63 GAC −23.11 CTAT 110.83 CTCC 43.07 TTGC −29.86 GTCC −128.75 AGG−27.85 TAAA 110.30 GTAC 42.84 CCGA −32.32 GGGC −132.40 GCC −38.10 CCAC110.05 GAAC 42.62 TGAG −34.69 GGGG −133.41 TGG −40.97 AATT 109.92 GAGC41.24 ATGT −34.90 TCGT −135.34 TTG −43.86 TGCA 107.12 GCCA 40.88 TAGG−37.28 GGTG −135.80 ACG −61.29 CATT 106.83 GCTT 39.88 GGCT −38.30 CGTT−136.77 GTT −62.25 TCAA 104.12 CAGG 37.16 GCCC −40.66 TGTC −137.57 CGA−62.60 AAAG 103.76 GATT 35.99 GGAG −44.01 GTGT −142.24 TGT −64.56 TACT101.53 GACA 35.71 TGTT −44.49 CGGT −144.04 GGC −70.30 AAGA 100.90 CTTC34.67 CGAC −45.06 GTGG −149.24 CGC −82.93 CACT 100.32 CTCT 33.87 GGTA−46.07 CGTC −155.95 CCG −85.43 AACC 99.86 GAAG 31.97 AGGC −46.08 GGTC−158.84 GGG −97.46 GCAG 99.17 TTGA 31.29 TACG −46.78 TCGG −159.56 GTG−110.90 ATGA 98.38 CTTT 28.94 AGTC −46.82 CGGG −159.99 GGT −112.41 CTAC95.93 TTAG 27.86 ACGC −47.10 GGGT −162.17 CGG −116.32 TCCA 95.63 GGAA26.38 ATCG −48.15 GGCG −171.27 GCG −118.80 AATC 95.61 ATTG 25.55 GTCA−52.15 CGCG −172.40 TCG −125.83 TGAA 93.81 CATG 24.39 TTTG −52.48 CGTG−180.34 GTC −126.67 TTAA 93.67 GCTC 22.00 GTAG −53.73 GCGT −194.57 CGT−130.10 TAGA 93.03 GAGA 21.55 TGCC −54.56 GTCG −207.74

TABLE 3 6-mer 6-mer z-score ACAGCT 266.45 ATTAAT 248.7 ATAATA 227 CAGCTA223.27 AATATA 220.6 AATACA 215.65 AGCTAC 211.24 AGATAT 211.07 AGCTAA210.24 ATATAT 209.3 AATACT 203.19 ATATAC 192.44 ATAACT 190.78 ATATTA189.76 ATAGCA 186.89 ATACCA 186.58 ATACAA 181.41 GCAGCT 180.69 ATTACA180.46 CAGCTC 180.29 ATAGCT 180.08 AATAAT 179.41 AGCTAT 178.14 CAGCTT176.31 ATATCT 174.41 AGCTGC 169 CAGCTG 167.78 AGCTGA 167.41 AATAAA167.35 ACTACA 167.11 AACAGC 167.08 ATTATT 166.89 AAGCTA 166.44 ACTACT164.71 AATACC 164.29 TATTAT 164.1 ACAGCA 161.72 AGCAGA 160.66 AGCAAT159.61 TAATAC 159.28 AATCCA 156.67 AATAGA 156.3 TATACA 155.5 AGCTCC153.55 CATATA 152.22 ATACAT 151.77 TATATT 150.71 TAATAT 150.37 ATTACT150.2 TCAGCT 149.79 AACTAC 149.11 AAAGCT 148.88 CAGCAT 147.47 ATACAC147.42 ATAGAT 147.33 ATCAGC 147.06 AGATAC 146.34 AGCACA 146.01 CAGATA145.75 TAGCTA 145.22 TTAGCT 144.8 AAGCTG 143.55 CACAGC 141.38 ACAACT140.89 CATACA 139.87 AGCAGC 139.64 ACTATT 139.36 CCAGCT 137.43 GATACA136.87 AGCTTC 136.64 AGCTCA 136.52 ACCAGC 136.02 AAATAC 135.35 AGCTTA135.22 AGAGCT 134.71 TAACTA 134.57 TACTAC 134.52 AACTAT 133.79 ATAAAC132.79 TAGATA 132.74 AACACA 131.7 CTAATA 131.46 AATAGC 130.99 GAGCTA130.78 ATACTA 130.56 ATATCA 130.47 CTACTA 130.24 ATACAG 129.95 CCAGCA129.73 CAGCAG 129.37 AATGCA 128.88 ACTAAT 128.87 AGCTTT 128.11 ATCCAC128.11 GAAGCT 126.98 CAGCAA 126.51 ACCACC 126.44 GCTACA 126.36 AGCTGT126.35 ATAACA 126.34 AGTTAT 125.56 TTACTA 125.4 AATTAC 124.76 AATTCA123.97 CAGCAC 123.54 ACAGCC 123.25 TTAATA 122.8 AGTATT 122.69 CAACTA122.15 CAATAA 121.87 AGCAAC 121.8 ATCTAC 121.63 TACACC 121.61 AGCACC121.59 ATAGCC 120.05 TAGCTG 119.3 AAAACA 119.25 ATTATA 119.17 AGTACT118.38 CACCAT 117.87 ATCTAT 116.19 ACCATT 115.23 TACTAT 115.17 TCAGCA115.13 AGCATA 114.84 TATTAA 114.69 CAAGCT 113.83 AGATGA 113.27 GATATA112.88 TAGCTT 112.54 TATTAC 111.72 AGCTCT 111.46 TCACCA 111.34 ATAGTA110.66 ATACCT 110.48 AGCATC 109.68 TATCTA 109.46 TACAAC 108.83 GCAGCA108.59 AGTAAT 108.57 TGCACA 108.53 TTTATT 108.51 ATGATA 108.34 CAAATA108.12 ACAATA 107.6 AATAGT 107.19 AACAAC 107.08 CACCAG 107.01 TAGCTC106.68 TACAGC 106.65 AACTGA 106.63 GCATAT 106.63 GAGCTG 106.39 ATTCAC106.22 AAATAA 105.92 TAGCAA 105.71 CCAGAT 105.22 ACCATC 105.14 AATAAC105.1 TACCAT 104.92 AGAACA 104.85 ATCATA 104.56 ATCACC 104.5 AGAAAT104.29 ATATAA 104.19 CATATC 103.97 ATTCCA 103.78 GGAGCT 102.99 TACAGA102.58 TACTAA 102.18 ATCACT 102.01 ATATGA 101.89 AAACAG 101.82 ACACAG101.77 ACACCA 101.38 ACAACC 101.23 TAAGCT 100.84 CAATAG 100.69 CTATTA100.61 TTACCA 100.56 AGTACA 100.42 AACCAC 100.39 CCACCA 100.19 AAACAC99.94 ATAAAT 99.38 GCTATA 99.35 GTAGCT 99.14 CAGCCA 99.11 TTCAGC 99AGACAC 98.97 AGCACT 98.85 CCAATA 98.8 AAACCA 98.68 CAGCCT 98.34 AAGCAC98.34 ACTGCA 98.25 AGAAGC 98.23 CCATCA 98.1 CAACCA 97.53 CAACTG 97.51ATTAGC 97.37 AATATT 96.98 ACCACA 96.82 ATATGC 96.53 GTATTA 96.49 CATAGC96.33 GTATAT 96.2 ACCAAC 96.14 CAGATC 96.05 AACATA 96.05 AGATCC 95.89CTACCA 95.82 GATCCA 95.8 ATTGCT 95.61 ACCATA 95.61 CATCTA 95.61 CCAGCC95.4 ACCTAC 95.39 TCAACT 95.32 ATGCAC 95.22 GAAATA 95.07 TATAGC 94.95TACCAC 94.81 AGCTAG 94.59 CCATAT 94.32 TATATA 94.2 CATATT 94.16 TAATAA94.05 AGAACT 93.81 TATCAC 93.66 CACCAC 93.38 AAAGCC 93.36 CTACAG 93.16GCAGAT 93.16 AGATCA 93.03 ACTTCA 92.78 ACACAC 91.91 ACCACT 91.48 AAGCTT91.27 ACCAAT 90.89 CTAGCT 90.83 ATTTAT 90.72 CAGTTA 90.71 CATAGA 90.61ATACTG 90.19 ATTACC 90 TATCAT 89.91 ACTATA 89.16 TACACA 89.01 GCTGAA88.67 CCATTA 88.62 TGCTAT 88.19 TACATA 88.12 CACCAA 88.08 ATAGTT 87.88CACCTA 87.77 GCACCA 87.64 CTATCA 87.58 GCTATT 87.58 TATTAG 87.34 CCACCT87.28 AGAACC 87.26 ACTACC 87.25 TATAAT 87.06 ATTTCA 86.86 TAGCAG 86.76AAGCTC 86.67 AACCAA 86.61 AATATC 86.37 TAGTAA 86.29 GCTGAT 86.25 TATATC86.21 TAATTA 86.14 AACCAT 86.06 ATAGAC 86.03 CCATCT 85.84 TTATTA 85.75TCAGCC 85.73 ACATAC 85.65 ACATAG 85.6 CACAAT 85.55 GTAATA 85.54 GAAGCA85.45 TCATAT 85.24 CAGCCC 85.03 ACCTAT 84.68 AGCCAC 84.68 CAGTAA 84.62CCAACA 84.17 AAAAGC 84.12 AACTGC 83.95 CCAACT 83.78 ATCATT 83.47 AGAGCA83.38 GATACT 83.35 CCACAG 83.35 ATAATT 83.26 TAAACA 83.21 ACATAT 82.99GCTACT 82.86 CAGTAT 82.76 ATCACA 82.36 TCAACA 82.34 AGCCCA 82.25 AATTAT82.21 ATCATC 82.17 TGCTAC 81.84 GCTTCA 81.55 CCACTA 81.49 GCTGCA 81.44TAGTTA 80.97 AATCAA 80.92 CAATTA 80.84 CTGCTA 80.71 ATATAG 80.66 TGCACC80.52 AAGACA 80.5 TAATAG 80.31 TGCAGC 80.23 CCTCCA 80.17 GATGCA 80.15AACTCC 80.09 TCCAGC 80.02 ACACTG 79.79 TATAAC 79.77 TTATAA 79.58 CAACAA79.5 GCTAAT 79.35 TGATAC 79 AGATCT 78.63 ATAACC 78.57 AGAAAC 78.2 ATTGCA78.18 AACACC 78.06 TGCATT 78 CAACTC 77.9 GTACTA 77.86 ACTCCA 77.83CAGATG 77.71 TGCAGA 77.69 AAGAAA 77.67 TCCACC 77.66 TAACCA 77.39 TAACAG77.34 TTATAT 77.04 TCTATT 76.92 ACACTA 76.75 CACTAA 76.68 GTAGCA 76.59AGCCAT 76.52 TCATCT 76.5 CACTAT 76.28 CAATAT 76.05 CACAGA 76.03 AGTTAC75.97 ATACTC 75.91 TATATG 75.77 CACTAC 75.68 ATTTCT 75.56 TACCAA 75.44GCAATA 75.24 ATCTCA 74.72 ACAGAT 74.63 TCACCT 74.58 CATCAG 74.49 TCAGAT74.33 AGTAAC 74.08 CTACAC 73.7 AATGAT 73.53 ATTAGT 73.5 TAGTAC 73.49TAACTG 73.35 AAAATA 73.29 AAAACT 73.19 ATTTAC 72.97 ATCTGA 72.97 ATCCAT72.95 ATACCC 72.75 AACTTC 72.62 AATACG 72.39 AAATCA 72.22 TTCACA 72.18CAGATT 72.08 CAGAAA 71.97 ACACAT 71.91 AAGATA 71.91 CTGCAG 71.63 GCAACT71.57 GATATT 71.57 AGATTC 71.53 ACCAGA 71.47 CTATAT 71.38 TGATAT 71.06AAGAGC 70.89 ATACGC 70.65 CTGATA 70.47 GATAAA 70.39 ACATCC 70.36 AAACTA70.26 ATCAAT 70.13 GAAACA 70.11 CATCAT 70.01 AGCTTG 70.01 TGAGCT 69.96CTATAA 69.96 ATTCAT 69.85 TACTGC 69.83 CAGAGA 69.69 CATTTA 69.68 AGCTGG69.06 GAATCA 68.99 TTATTT 68.98 ATCTGC 68.96 TAGCAC 68.84 ATGCTA 68.58TATACT 68.54 TCATCA 68.5 AGATGC 68.48 ATAGCG 68.46 CATACT 68.15 TAGCAT68.15 TACAAA 68.02 TACCTA 67.99 CATCTT 67.88 ATCAAC 67.83 ACCTTC 67.82TTAGCA 67.82 AGTAGC 67.72 TTGCTA 67.61 TAAGCA 67.57 AATATG 67.49 TCACTA67.42 CATTAA 67.2 AGCAAA 67.17 GGCTAT 67.15 ATGCAA 67.06 ACACCC 67.05GCAGTA 67.04 AGTAAA 67 TTCACC 66.71 GATACC 66.69 CTACAA 66.54 CTGAAA66.27 ATGTAT 66.24 CACCTT 66.08 ACCCAG 65.77 ATATCC 65.64 CAAAGC 65.58ACAGTA 65.5 CATACC 65.47 TGAATT 65.43 TATTCA 65.2 GATATC 65.15 ACAAAT65.04 CCATTT 64.91 AAAAAC 64.81 GCTCCA 64.64 AAGCCA 64.61 CCTTCA 64.45GAGCTT 64.45 ATAGAA 64.31 TGAAGC 64.22 GAACCA 64.2 ACAGAC 64.16 ACAGAG64.14 TGTATA 64 TGAACC 63.94 TTATCA 63.94 AACAGA 63.94 GATTCA 63.93ATGAAT 63.83 GCTGCT 63.71 CACACA 63.58 GCAGCC 63.54 TAGCCA 63.4 GAGCTC63.35 AACTCA 63.19 GTATCA 63.01 CATAAT 62.96 TCCACA 62.68 CAGAAG 62.65CCCAGC 62.57 CGCTAT 62.55 CCTACT 62.52 CAATAC 62.45 CAACTT 62.28 AGAATC62.21 GAGCAC 62.17 TCTGCA 62.09 CAATCC 61.99 AGAATT 61.72 CATTAC 61.65ACTGCT 61.63 AACACT 61.62 GTAACA 61.62 TATCAG 61.58 ATGAAC 61.56 CAACAT61.55 TCAATA 61.47 TGCATC 61.37 GCACAG 61.24 AGAGCC 61.12 AGTATA 61.1GTAGAT 60.86 TACACT 60.8 TATCCA 60.75 AGCATT 60.65 ATTAAA 60.65 ACAAGC60.61 ACTGAT 60.54 CAACAG 60.42 ATGCTG 60.37 TATCAA 60.3 AGTTGA 60.16TTTACT 60.02 CTTCAC 59.96 GAAGAT 59.8 CATCTG 59.68 ATCCCA 59.65 CAACAC59.49 AACATC 59.39 AAGCAG 59.37 CATCAC 59.3 ACTAGC 59.24 ACAACA 59.21CATAAC 59.02 TATTTC 58.98 CCATAA 58.89 CACCCT 58.6 ACACCG 58.31 TACTAG58.31 TGAATA 58.12 ACAATC 58.11 AGGAGC 58.09 TGAGCA 57.87 TATGAT 57.78TATACC 57.77 GATATG 57.64 TCTGCT 57.47 AGTAGT 57.38 ACCAAA 57.17 TGTAAT57.16 CAGCGA 57.12 AAGCAT 57.06 GATGCT 57.03 CATTTC 56.98 AAGATG 56.93ATCCAG 56.88 CATATG 56.87 TGGATT 56.83 TGCAAC 56.76 CACCTC 56.75 CAGACT56.73 ATGCAG 56.72 GTAACT 56.7 AGTAGA 56.45 TATGCA 56.42 GGAATA 56.3AGTATC 56.23 CATTAG 56.19 CAGTAC 56.18 TACATC 56.14 AAAGCA 56.13 TCTCCA56.01 ACAGAA 55.96 GGAGCA 55.88 CAGCCG 55.8 CTGCAC 55.6 AGCAGT 55.46CACATA 55.45 TATCTG 55.37 TACTCA 55.36 CTTATA 55.34 GACACA 55.17 TGTATT55.14 GAATCT 55.12 AACAGT 55.1 ATCAGA 55.06 GCATCT 54.8 AACTAA 54.79CAGCGC 54.76 ACACAA 54.74 TAACAA 54.73 TGCATA 54.73 TTACAG 54.68 GAAGCC54.6 AAGAAC 54.37 TTACTG 54.36 GTTTAT 54.25 ACCAGT 54.25 AATCCT 54.22ACAAAG 54.18 TCACAG 54.18 ACTATG 54.15 GATGAT 54.08 TGCAAT 54.03 GTAATT53.95 TTAGTA 53.95 CATGAA 53.93 CATCTC 53.89 AGCCTC 53.8 CACATT 53.79AATTAA 53.78 GCACAT 53.76 ATTGAT 53.75 AAAACC 53.75 TACCAG 53.61 ACTAGT53.57 AAAGAT 53.54 CTCCAA 53.42 CACACT 53.37 CCACAA 53.24 TACAAT 53.13CTATTG 53.01 TAGTAG 52.94 GATCAT 52.84 AATCAT 52.81 ATTCAG 52.71 AGTACC52.64 AAAAAT 52.58 CAGAAC 52.37 ACAGTT 52.35 TGAAAT 52.33 GAGATC 52.3CATTCA 52.24 CGAGCT 52.22 GATAGC 52.17 TCATTA 52.11 CTCCAG 52.03 CAGAGC51.98 TGCTGA 51.92 CCAAGA 51.92 ATAAGC 51.86 TTACAC 51.85 AGATGG 51.72TCTACT 51.69 TTACAA 51.68 TGCAAA 51.62 TAGTAT 51.42 TTTATC 51.26 CCCAGA51.25 GACTAC 51.19 ATTCTA 51.19 CAAAAA 51.15 ATACTT 51.15 ATACGA 51.08ATCTTC 51.06 ACATCA 51.04 AACCCA 51 CATAAA 50.95 TGAAGA 50.88 TAGATG50.83 CTGCAT 50.78 CAAGCA 50.65 AAATCC 50.5 GAACTA 50.47 CTATGA 50.36ACTTAT 50.3 CCAAAT 50.25 CCTGCA 50.24 TACTCC 50.15 GAGCAG 50.07 TACCCA50.02 ACCTCC 49.97 GTTATA 49.88 CATCAA 49.87 TGATAA 49.86 AATCAC 49.84ATTAGA 49.71 CATCCC 49.63 GTATTT 49.61 ACCTGA 49.59 ACTGAA 49.51 CATCCA49.5 TAACAC 49.46 AGAGAT 49.39 AGCATG 49.33 CAACCC 49.27 ACTTCT 49.23ATGATC 49.2 GATAGA 49.19 GAACAG 48.99 CCAAAA 48.88 GAAACT 48.8 GACAGC48.76 CAATGA 48.7 ACAAGA 48.64 CTCAGA 48.55 AGATAA 48.54 CTAGCA 48.43ATCAAA 48.36 TCTTCA 48.34 GATGAA 48.34 ATCCAA 48.27 AACCAG 48.27 CACATC48.25 TCCAAC 48.16 TAAAGC 48.1 AGACCC 48.09 CAGGAA 48.07 TTAACA 48.04TTATTG 48 CATGGA 47.99 CTTCCA 47.96 CAGTTG 47.94 ATATGG 47.86 GTATCT47.79 CTTCAA 47.73 GAGAAC 47.72 TTCACT 47.71 AAAGAA 47.71 ACACCT 47.51AGTTCA 47.47 ACCTGC 47.45 TATGCT 47.44 TTGTAT 47.43 ACAGGC 47.42 TCCATA47.27 TATTCC 47.17 GGCTGA 47.15 TGCTAA 47.05 ACCCCA 46.96 GTAGTA 46.89ATCCTA 46.79 CGCATA 46.68 AATTCT 46.54 GGATCT 46.23 TTATAG 46.2 ACTAAA46.2 CAGACA 46.2 GTACCA 46.16 CAAAGA 46.13 ACTCCT 46.11 CACAGT 46.1AAACCT 46.05 CGCTGA 46.02 AATGAA 45.98 GTTACT 45.95 TACAAG 45.86 AGGAAT45.81 ACTCAA 45.79 ATGACA 45.7 ACCATG 45.69 CATAGT 45.61 ATATTG 45.6AGGTAT 45.57 CTCAGC 45.54 ATATTC 45.46 CTACTC 45.36 TACAGG 45.33 CCTCAG45.33 CACTGC 45.24 GCACCT 45.13 ACTATC 45.05 CTGCTG 44.96 AGCCTT 44.9GGTATT 44.89 TAAATA 44.79 TTCCAC 44.78 CAAAAG 44.78 TTTCAG 44.77 TAATGA44.74 TTACAT 44.73 AACCCC 44.73 ATGGTA 44.66 CACTGA 44.64 CAAATC 44.64CATGCT 44.62 GCTTCT 44.61 TCCATC 44.59 TCAGTT 44.56 ACTGCC 44.54 CTTCAT44.49 TGCTCA 44.45 TGGAAT 44.41 CTTCAG 44.4 ACATCT 44.4 CACCTG 44.39ATGCAT 44.36 CCAACC 44.33 CATTAT 44.25 CTAGTA 44.22 TACAGT 44.18 TACTGA44.12 CTACTG 44.1 TAGAAT 44.07 ACAGCG 44.06 ATGGAT 44.04 TTCATA 43.92ATAAAA 43.84 ACTCAG 43.83 CTGCAA 43.65 CAGGCT 43.52 TGATAG 43.5 AGAGAC43.5 CCATGA 43.49 CTACTT 43.4 ACATTA 43.36 GAATAG 43.29 GCAGTT 43.25CACAAA 43.25 TGAACT 43.25 TGAGAT 43.21 CACTAG 43.13 CCCCAT 43.06 CTAACA42.92 CCAGTA 42.86 CTCCAT 42.76 CAAGAT 42.74 GAACCC 42.71 CCAGAA 42.65TTCATC 42.62 AACCTG 42.6 AGCCCC 42.52 CCTACA 42.47 GGATAT 42.47 TCCACT42.41 ATTACG 42.39 AAGATC 42.32 AGCCTA 42.29 ACACGG 42.21 CTGAAT 42.18CTATTC 42.04 ACAATG 42.01 TCATAA 42 TGAATC 41.89 ATCAGT 41.74 GATTTA41.74 AATCTG 41.72 GCTGGA 41.71 AGCGAT 41.68 TATTTT 41.67 GAATCC 41.64TTTACC 41.63 AGCAGG 41.62 AAATAT 41.58 ATTATC 41.55 GAGATA 41.47 CCAGGA41.41 TCATAG 41.39 GCTTTT 41.33 ATGACT 41.26 GAACTG 41.19 CTGAAC 41.19GGCTAC 41.14 AGCTCG 41.12 ACCCAC 41.04 CAATCA 41.01 AGCGCA 40.99 ACTCCC40.96 CTCCAC 40.95 AATCTA 40.93 GCATCA 40.9 ATTTTT 40.87 TGAAAA 40.84TCACAT 40.84 ATTCCT 40.83 TTGATA 40.69 CACAAC 40.69 TATTGA 40.61 AGGCTG40.57 AATGCT 40.53 TATTTG 40.53 CAGGTA 40.51 CATGCA 40.5 AAACTG 40.46AACAAA 40.38 CTTTCA 40.38 CAAACT 40.38 TATTTA 40.37 GGAACA 40.37 GCCACT40.35 CGCAGC 40.24 TAAATC 40.2 AGGTAC 40.19 ACTGTA 40.17 GAAGGA 40.16CAGTTC 40.09 TTTTAC 40.04 TGAACA 40 GCTATC 39.99 GCTTTA 39.98 ATTAAC39.98 GAATAT 39.96 CCATCC 39.94 TACCTG 39.93 CAAACC 39.91 CACTTC 39.84TTATAC 39.76 TTGCAT 39.73 CTGTAT 39.67 GAAACC 39.64 AGTGAT 39.53 CAAGCC39.3 AGGATT 39.29 CAGTAG 39.29 AGAATA 39.23 ATGCCA 39.23 GTGATA 39.2AATCCC 39.2 AACAAT 39.16 GAAGAA 39.02 TAACAT 39 CAAACA 38.97 AGGATA 38.8AAATGG 38.8 TTTAAT 38.75 TTTACA 38.66 GACACC 38.6 CTTACT 38.54 TAAAAC38.52 TCAGCG 38.41 TTTGCA 38.37 ACAAAC 38.35 GATCTC 38.32 TGGATC 38.23AAAAAA 38.16 CACGAT 38.16 TTTTCA 38.15 AAACAA 38.11 AATCAG 38.1 ATGAGA38.04 CCAATT 38.03 CTATAC 37.99 AGGACA 37.98 GAACAA 37.98 TCCAAA 37.84TTTCCA 37.82 ACTGGA 37.81 AAGCAA 37.77 ATGAAG 37.77 ACAAGG 37.76 AAGCCC37.72 GCTCCT 37.68 ACACGA 37.64 AGCCGA 37.6 CCAGCG 37.57 ATCCCC 37.48TGTAGC 37.33 AGCCGC 37.29 TCAGAA 37.28 TAAAAA 37.16 GATAAT 37.15 TCCTAC37.13 TACTTC 37.09 GAAATG 36.99 ATATTT 36.91 GAACTC 36.81 CTAATG 36.79AACAGG 36.76 AAGGCT 36.76 TCCAAT 36.72 TATGAC 36.67 ACCTCA 36.63 TGATGA36.62 AAGCCT 36.59 GAGACA 36.59 ATGATT 36.47 CCACCC 36.46 GCAATT 36.27CCCACA 36.26 TACTTA 36.25 TGACCA 36.23 CCATAG 36.13 ATTCCC 36.08 CCCACT36.08 AAACCC 35.99 GAACCT 35.97 GTTATT 35.96 CCATAC 35.9 TTCTAC 35.9ATGAGC 35.85 GATCAG 35.85 TATGAA 35.79 CAAGAA 35.7 TATAAG 35.62 ATCTCC35.59 ACTACG 35.54 GAACAC 35.49 TATTGC 35.48 TAAATG 35.47 ATGAAA 35.43GATCTG 35.38 TATAAA 35.37 ATACGG 35.34 ATTATG 35.3 CAAGGA 35.22 AAATAG35.19 AAGACT 35.13 ACCCCC 35.07 AGATTT 35.05 GAGCAT 35.02 CCCCAA 35.02AAATGC 35 TGATCA 34.95 GAGCCC 34.9 ATCTGG 34.82 AGAAGT 34.81 ACTAAC34.76 TGGAGA 34.73 TAATCA 34.7 CAACCT 34.69 GACCAC 34.64 GTAAAA 34.56TCTACC 34.54 GATTAC 34.54 CCAGTT 34.52 ACCAGG 34.5 GCAACC 34.48 ACATTT34.47 ACTTCC 34.46 AAGTAC 34.43 ACCTTA 34.43 TAATTG 34.26 CACCCA 34.26ATCTTT 34.13 TTAATT 34.07 TTGCAC 34.06 CACCCC 34.06 CATGAT 34.02 ATAGGT33.92 GCTACC 33.92 ATAGAG 33.86 AGTTCT 33.81 TGCTTA 33.8 GCTGTT 33.73AAGAAT 33.68 GATTCT 33.67 ACCGCC 33.57 ACAGGG 33.56 CAAGAC 33.52 CCACTG33.47 AAGTAA 33.38 TGTACT 33.36 CTGAAG 33.36 AGACCT 33.33 ACTAGA 33.32AAATCT 33.23 GCTATG 33.22 TTGATT 33.18 TGCTGC 33.18 AGAAGA 33.16 AATGGA33.11 TTCCCA 33.1 AATGGT 33.08 GTTACA 33.07 TCAGGA 33.04 TACACG 32.96TTACTT 32.93 TAAAGA 32.93 CACTTT 32.87 AACTGG 32.82 CTCACC 32.81 ACATGC32.79 AGCCTG 32.79 TCCCAG 32.78 ACATGG 32.77 CACTTA 32.69 CCCCCA 32.63ATGATG 32.59 GCAGAG 32.58 ACATAA 32.53 AAAGTA 32.47 AAAAGA 32.46 GAACAT32.46 CAATTC 32.4 CCACTT 32.39 GGCTTT 32.37 TTCAAC 32.34 GCTTAT 32.32CAGGAT 32.32 AGCCCT 32.3 CAATGC 32.26 TGTATC 32.2 TGATCT 32.2 CTGTTA32.12 ACAATT 32.12 TATCTT 32.05 ATTCAA 32.04 TTCAAA 32.03 CAGACC 31.98ACATGA 31.9 CTAAGC 31.75 CTAAGA 31.7 ATAAAG 31.69 AACTAG 31.56 GTACCT31.55 AGATAG 31.51 CAAAAT 31.5 GTGAAT 31.48 AGCCAA 31.4 GAGATG 31.33GGAGAA 31.29 AATTGC 31.29 ATGGCT 31.23 GCAAAT 31.22 TAGAAC 31.2 ATGGAA31.19 GATGGA 31.15 CTGCTC 31.09 CCAGAC 31.09 ACTCAT 31.09 CGAACA 31.02AGCCAG 31.01 GGATAC 31.01 GCAGAA 30.98 GTAAAT 30.95 TTTATA 30.85 TGCTTC30.8 CTCAAC 30.7 AAAGAC 30.65 GCTCAA 30.56 ACAGTC 30.55 CACAAG 30.53TGGATA 30.52 GCATAG 30.51 ACCTGG 30.5 CTCCCA 30.43 TGATTC 30.33 GCTGTA30.33 GCATAC 30.26 TCAAGC 30.25 CAGAAT 30.22 TCATAC 30.18 CATCCT 30.14TGAAAC 30.04 AAACTC 30 GCATTT 29.91 AAGGAC 29.86 ACAAAA 29.84 GAGTAT29.79 AAATGA 29.74 AGCGGA 29.72 GAATTA 29.71 AGTGAA 29.7 AACAAG 29.69TCAAGA 29.63 AACCTT 29.53 GAATAA 29.53 CTCACA 29.49 TCACAA 29.46 CCCATC29.46 TGTGCA 29.41 ATTGGA 29.27 ATTGAA 29.23 ATAATG 29.22 CCTTTA 29.21GGAACT 29.21 TTCAGA 29.18 GCAACA 29.12 ATAATC 29.11 CTCATA 29.07 GAATAC29 CTGATC 29 ACCAAG 28.96 CACAGG 28.94 ATTTCC 28.86 GCATAA 28.83 TCCCAC28.82 GAGCAA 28.81 TCCAGA 28.65 TTCCAT 28.63 GGCACA 28.6 TTTCTT 28.55TAAACC 28.53 AAATTA 28.46 CTTGCA 28.46 ACCTCT 28.41 TCAGTA 28.39 GAAGTT28.37 TACATT 28.33 GACCCA 28.32 GACCAT 28.29 CCACAT 28.23 CATTTT 28.22ATCGCT 28.15 AAGGAA 28.11 TATAGT 27.92 TAACTT 27.89 CTTAGC 27.87 CTTAAG27.83 CCTGCT 27.78 GATACG 27.7 TAGACA 27.69 GGTTCA 27.68 ATGCTT 27.68TTCATT 27.66 TAATCC 27.62 ATATGT 27.59 CCACGA 27.56 AAAATC 27.56 GAAGTA27.52 TGCTCC 27.5 CCCATA 27.47 TTAACC 27.43 TAGAGC 27.38 AGGCTA 27.34GCAAAA 27.32 GCTCAT 27.31 AGGACC 27.3 AGACTA 27.27 CCATTC 27.24 ACGACT27.24 AGGAAA 27.08 TTCCAG 27 TCACCC 26.94 AAATTC 26.9 AACTGT 26.84TTCTAT 26.75 TAATGG 26.71 ACAACG 26.66 AGGTGA 26.64 AGGAAC 26.59 TGGTAT26.57 AAACAT 26.53 AGTTGC 26.52 CAGTGA 26.47 GATTCC 26.47 AGCGAC 26.44ATCAAG 26.44 ACCCCT 26.4 CCCCAG 26.4 CGTATT 26.39 TACTTT 26.39 AGACAG26.37 TTATGA 26.36 CAAGAG 26.32 TGCAGT 26.31 AGGAGA 26.3 CCATGC 26.27GAAAGC 26.23 ACGATA 26.23 CAAGGC 26.22 CTTTAT 26.22 CATTCC 26.22 GAAAAT26.2 CATTGC 26.13 TATACG 26.08 GTAGAA 26.03 GGACCA 26.02 GCTCTT 25.97TGTTAC 25.87 TCCCCA 25.78 TCCATT 25.78 AGAAAA 25.72 CCCAAG 25.69 GTGCAT25.62 TTTTAT 25.58 ACCTTT 25.53 CTACGA 25.52 CCTTAA 25.52 GGCATA 25.52GAAAAC 25.47 AGTTTT 25.42 GAATTC 25.36 GATCAC 25.35 CACACC 25.27 AAGCCG25.26 ACTGAG 25.25 ATCTAA 25.24 AGACTG 25.18 AAGTTA 25.15 TCACTG 25.11ATCGCA 25.08 CGATAT 25.02 GTCATA 24.99 AACCCT 24.98 TTAATG 24.97 ACTTTT24.96 ACGCAG 24.82 ATTTAA 24.79 TGATTT 24.76 CTGATG 24.75 ATCTTA 24.75TATGTA 24.71 GAAGAC 24.69 TTACCT 24.69 TAGATT 24.68 ATAAGT 24.67 CGGATA24.54 CTTTTA 24.43 ACCACG 24.41 ACAGGA 24.4 TATGGA 24.4 TTACTC 24.37GCAAAG 24.34 GAGGCT 24.32 ATCATG 24.24 TGTTAT 24.2 GCAAGA 24.19 CTGGAA24.11 CTATTT 24.06 TCCATG 24.06 AGTGCT 24.05 AGCGCC 24.04 CTGTAA 24.03GAGCCT 24.03 ACCCAT 24.03 TGGAGC 23.99 ATGGAC 23.95 CAGCGG 23.91 TAAGAA23.9 GCATTA 23.88 AGTCAT 23.86 GGAACC 23.86 CCCTCA 23.86 AACCTA 23.83CTTACA 23.77 GGTAAT 23.77 GGAGCC 23.69 CCCACC 23.65 GGAGAT 23.63 GTAGTT23.62 CTGAGC 23.61 TTTCAC 23.61 CTGAGA 23.59 CATAGG 23.58 TTTCAT 23.55AAGTAT 23.48 AATTCC 23.45 TACATG 23.39 GGAAAT 23.35 TGACCT 23.35 CGCACA23.34 TACGAC 23.32 ATTTTC 23.32 CCTGAA 23.3 ACAGTG 23.28 AATCGA 23.28ATCTCT 23.2 GACATG 23.19 AAGTAG 23.18 ATACCG 23.16 GGCAGC 23.07 TCTACA23.02 CTAAAA 23 ACACGC 23 ACCCTG 22.98 TGAAAG 22.87 CACATG 22.71 CCTGTA22.67 TGGTAA 22.66 CAGAGT 22.64 CCGCTA 22.64 GGAATC 22.63 TTCAAT 22.52CTGCTT 22.49 CCTATT 22.49 GGTGCA 22.48 CAGGAG 22.48 CCCCAC 22.46 AGGCTC22.43 CTAACT 22.4 CCAAGC 22.4 GCAGAC 22.36 CCAGGT 22.36 ACTGTT 22.3ACCCTC 22.25 CTATGC 22.23 TCTAAT 22.15 TGGAAA 22.14 CAGTTT 22.08 TAATTC22.08 TCACTT 22.06 TTTTTA 22.01 CCTTCC 21.92 ATCGAT 21.89 AAAATG 21.87GCACAA 21.78 TGCACT 21.71 AAGACC 21.69 AATTGA 21.68 GCATCC 21.65 CACTGT21.65 GAAAAA 21.64 GCTCAG 21.6 AACACG 21.59 GTTGCA 21.57 GCCCCA 21.54GACTAT 21.53 GACCAG 21.52 GTTCAT 21.39 GAGAAT 21.24 TAAAAG 21.2 GAATTT21.15 CACCGT 21.13 GATTAT 21.11 TTTCAA 21.05 ATCCTC 21.03 CTGGAT 21CCTATA 20.97 ATAGGA 20.97 TAGGTA 20.96 GGATTT 20.93 ACTCAC 20.88 CGACTA20.85 GGATCA 20.8 CTACCC 20.78 ACTTAC 20.74 GATAAC 20.71 GATCCC 20.66TACGCA 20.62 GCCACC 20.56 AGACTC 20.56 GACTCA 20.5 CCTTAT 20.39 TAGGAT20.38 AACATT 20.37 ATGCTC 20.32 ACTCTA 20.3 CTGCCA 20.29 TGGCTA 20.29AGTCCA 20.26 CAGTCA 20.24 TTCCAA 20.24 GACATA 20.22 TCTATC 20.15 TCCTGA20.13 ATGGCA 20.05 GTAGCC 20.05 CCTGGA 20 CTTAGA 20 AACGCT 19.94 CGCTAC19.9 CTGTAG 19.87 CACTCA 19.87 CTTCTA 19.83 TCCTTC 19.8 CAAGTA 19.73ATCAGG 19.71 TATTGG 19.66 AGTTCC 19.66 ACACTC 19.6 AATTTA 19.59 ACATTG19.58 GAAATC 19.45 TGAAGT 19.45 GTACAT 19.44 CTTTAA 19.44 CATTGA 19.38GGCTTC 19.38 CACGAA 19.33 TATCCT 19.28 ATGGAG 19.27 AATAGG 19.25 GTATAA19.24 AATAAG 19.23 GGATTC 19.19 TCTATG 19.13 ACCCTT 19.09 ACTTTA 19.01CCAATC 19 TCTGTA 18.99 GCTCTA 18.93 GATCTT 18.92 GGATTA 18.85 CGTATA18.83 ACGAAC 18.75 ATTCTT 18.75 AGGTCA 18.72 TAGAAA 18.72 CGTAAT 18.7GTACAG 18.63 ATGTAA 18.6 TTCATG 18.6 AGTTTC 18.56 TAGTTG 18.52 TGGACA18.5 ATTTGC 18.49 CACCGC 18.45 CTCTAT 18.44 CAATCT 18.42 GAGAAG 18.39ACATTC 18.38 ATTTGA 18.37 TTGCAA 18.35 AAGATT 18.34 AAAGGA 18.34 ATTGTA18.33 TTAAAA 18.28 ATATCG 18.27 ATAGTG 18.25 GAGACT 18.19 GCTTAA 18.18TGATTA 18.16 GGATCC 18.16 AGCACG 18.12 AACCGC 18.1 TTGCTG 18.05 CCAAGG17.94 AGGCTT 17.91 CGCAAA 17.91 CCGATA 17.87 TCAAAT 17.85 CCGAGA 17.85GCCATT 17.84 GCCATA 17.82 GCACTA 17.75 ACTCTG 17.67 AGTAAG 17.64 CGCTCA17.58 TATCCC 17.54 AACTCT 17.47 TCCACG 17.46 GGAGAC 17.43 CTTGAA 17.42TCTCAT 17.31 TAGCGA 17.31 CTAAAG 17.28 CACTCC 17.24 CCGTAT 17.21 GAGAAA17.2 AACTTT 17.19 CACTCT 17.18 GACTCC 17.16 GCACCC 17.12 TTATCT 17.12TAGCCT 17.07 CCTACC 16.97 TAAGAT 16.95 GCAATG 16.95 GGTAAA 16.95 AAAATT16.92 AACGGC 16.9 CTATCT 16.81 TATCTC 16.81 GCTCCC 16.8 CTGACA 16.79CATGGC 16.78 GACCTC 16.77 CCTTGA 16.76 CTCATC 16.72 CACGGA 16.69 CTATGT16.65 TAGAAG 16.62 CATAAG 16.58 GGAAGC 16.58 CGCAGA 16.48 AACGCA 16.48CGAAGA 16.41 TAACCT 16.4 CTGATT 16.33 CAGGCA 16.28 GAAAAG 16.25 CCCAAC16.24 TAGTGA 16.23 TTGCAG 16.2 TGAAGG 16.18 TTTGAA 16.15 TACCTT 16.14GCACAC 16.12 ATGACC 15.97 TTAAGC 15.91 GTTGCT 15.9 CATGTA 15.9 ACGACC15.86 CAGGTT 15.84 AAAAGT 15.82 AGACCA 15.79 GCTTGA 15.71 GATGTA 15.67TGACAT 15.66 TTCTCC 15.65 TTAGAA 15.63 TTAGAT 15.61 ATTTTA 15.6 TTAAAT15.52 GGTACA 15.49 CATCGC 15.48 GCCATC 15.39 AATTTT 15.39 TCAATC 15.38ACCCAA 15.38 CTGTTT 15.36 CCAGAG 15.35 AGAAGG 15.33 TCATTT 15.32 CCAGTC15.26 AGTAGG 15.25 TGCAAG 15.23 AGGATC 15.22 GACAAC 15.19 TCCTCC 15.19TCAATT 15.18 TCAAAA 15.15 CCTGAT 15.13 ATCCGC 15.08 GACCTT 15.07 TTATTC15.07 GCTAAG 15.01 CTCAAG 14.96 CAGGCC 14.89 ATGTAC 14.83 CTTCTG 14.71AGACAT 14.69 TAAGTA 14.61 TTGAAG 14.6 ATGTTA 14.54 TGGAAC 14.52 GGCTCC14.47 ATAAGG 14.45 CTTATT 14.45 ATCCTG 14.42 TGTTTA 14.41 TGAGAA 14.39CACGCC 14.39 CCATGT 14.39 ACGCTG 14.36 TCCAGT 14.34 CTACAT 14.31 AGTGCA14.28 AATCTT 14.25 GGCTCA 14.24 CCCTAT 14.21 CCAGGC 14.21 CTGGAG 14.2ACCCGC 14.16 GGTATA 14.14 GACTTC 14.11 AAGAGA 14.08 GCTTCC 14 AGCGCT 14AGACTT 13.99 AAACGC 13.99 TCACCG 13.98 CACGCA 13.93 CCCAGG 13.91 CTCTGC13.88 CGAGAA 13.83 TATAGA 13.82 AAAGCG 13.82 GAGTAA 13.8 GATTGA 13.77TTGAAA 13.74 TAATTT 13.71 AGTTGT 13.57 GGAGTA 13.54 TAAACG 13.52 CCGCTG13.48 GGCTGC 13.46 GGTACT 13.42 GTGCAA 13.36 TCTGAA 13.23 TCCAGG 13.15CTTTAC 13.11 GGAAAA 13.07 ATCCTT 13.06 GAAGGT 13.04 GATTAA 13 CAATTG12.98 CATGCC 12.96 TCTTTA 12.95 GATTTT 12.9 TTTGAT 12.87 CCACTC 12.84TGTACA 12.83 TATGCC 12.83 GCTGCC 12.82 ATGGTT 12.82 GTTCCA 12.79 ATCCCT12.79 ACTAAG 12.76 ATTCTC 12.75 AACCTC 12.75 CCTATG 12.71 GAATGA 12.69ACAAGT 12.63 TACTGT 12.62 AGGTAA 12.62 AACGAC 12.6 TCCGCT 12.59 TCAAAC12.55 GCACTT 12.49 AATGCC 12.48 ACGCTT 12.45 CAACGC 12.44 TAACTC 12.43TCTTAC 12.42 CTTCCC 12.42 ACACTT 12.38 TTTTAA 12.23 GAACCG 12.23 GGGAAT12.21 TTCTCA 12.16 TGCTCT 12.15 GTACAC 12.13 TTTTTT 12.1 GTTTCA 12.07CCCAAT 12.04 TTCAAG 12.03 TTGAAT 12 AGTATG 11.99 TAGTTT 11.98 CGACCA11.98 GCATGA 11.98 CAGGAC 11.97 GCCTCA 11.96 GTCTAT 11.95 CTATCC 11.89TGCCAT 11.88 CGATCA 11.82 AAGGAT 11.76 GTGGAT 11.71 CCATGG 11.69 TCAACC11.69 TCCCAA 11.68 GCTGAC 11.66 TCAAAG 11.63 GACACT 11.61 TCCAAG 11.61CGGCTA 11.53 GCCATG 11.51 GCCCAT 11.46 GAGCCA 11.41 GAAAGA 11.4 GCGTAT11.39 AAACGG 11.38 CCCAGT 11.36 ACACGT 11.35 TTCCCC 11.35 GGCACC 11.33AGCCGG 11.32 TTAAAG 11.31 CTATAG 11.27 ATCTTG 11.27 TACTGG 11.23 CTCAAT11.2 GCTAAA 11.18 GGTTAT 11.16 TGCCAC 11.14 GAGACC 11.07 GTTACC 11.04AGGAGT 11.04 CCGCAG 11.03 CAAATT 11.02 CTTCTC 10.99 TATGTT 10.99 AATTTC10.99 ACCGCT 10.99 CCCGCT 10.9 CGATTA 10.87 ACATCG 10.86 CCGGCT 10.85TAGATC 10.82 AAGTTG 10.82 CTTGAT 10.79 TACCGC 10.78 AAAGGC 10.74 GATCTA10.72 TCCCCT 10.64 GATAGT 10.62 GGATAA 10.61 TGAGTA 10.57 GGAGTT 10.54ACGCAC 10.52 CCCATT 10.51 TGTAAC 10.49 GATTTC 10.48 TAACCC 10.46 AATGTA10.46 ACGGCC 10.46 TGCAGG 10.44 CTGTAC 10.44 AACATG 10.43 ACTGGT 10.38AAGGCC 10.36 TAAAGG 10.29 TATTGT 10.25 GGAGGA 10.19 AAGTGA 10.18 ATTTGG10.18 TGTTTT 10.17 CAAAGT 10.16 AGTCAC 10.14 CTGAGG 10.12 CTAGAT 10.11AATTGG 10.08 GGAAGA 10.08 CTCTTA 10.04 CTCTCA 9.99 GAACTT 9.97 AGAGAA9.94 GAGGAT 9.93 GGGAAA 9.93 CCCTGA 9.92 CCAATG 9.9 TCATGA 9.89 CCTTCT9.88 TCATTG 9.81 TACCCC 9.78 TTCTGA 9.75 AGAACG 9.72 ACGCTA 9.69 CTCCTA9.69 TCCGCA 9.59 TTCACG 9.57 CGAATA 9.54 ATTTTG 9.43 GCCACA 9.39 CCTAGA9.37 TTATCC 9.33 AGGCAC 9.29 GGCAAA 9.28 AACCCG 9.28 GTTAAT 9.27 AATGGC9.23 GTATAC 9.16 CAGTCT 9.12 CTCAGT 9.12 TTTATG 9.09 TGAGCC 9.05 GGTGAA9.04 TAAAAT 9.04 CACACG 9.02 GTACTT 9.02 TTACCG 9 GCCAGA 8.99 TCGCTA8.97 GGCTCT 8.95 GACAGA 8.93 GGAATT 8.9 TATTCT 8.89 CCGCAC 8.89 TGCCAA8.87 GCCAAC 8.84 GATCCT 8.82 ACGCCA 8.74 AAAAAG 8.73 CCAAAC 8.69 TAACCG8.68 TTGAGC 8.68 GCATTG 8.65 CACTGG 8.65 GTAAGA 8.62 GACAAA 8.62 CCCTTC8.61 TTAATC 8.61 GTACAA 8.54 ATAAGA 8.53 AATCCG 8.5 TTCTTC 8.39 CATGAG8.38 GAAATT 8.32 CATACG 8.31 TCTGAT 8.28 GACCAA 8.27 TAAGAG 8.24 GGATTG8.2 CAAATG 8.17 CCACGG 8.17 GAGAGA 8.12 GCTTAG 8.08 CAGCGT 8.07 GTGCTA8.07 TTAACT 8.05 TGATCC 8.04 AATGAC 8.04 GTAACC 8.01 CTCAGG 8 CGATAC7.98 CTTTTC 7.89 TTCAGT 7.81 CCCCTT 7.75 TGCACG 7.71 TTCTTA 7.71 TAATGC7.7 CCTGAG 7.69 TATCCG 7.64 GACATC 7.64 GACCCC 7.61 CTTTGA 7.6 TTAAGA7.56 CACGAC 7.55 TAAATT 7.54 ATTGAC 7.51 AGAAAG 7.5 TTTGCT 7.5 CCAAAG7.46 CACGGC 7.4 GTTTTT 7.39 TGTGAA 7.37 GTAATC 7.37 CGTATC 7.36 TACGAT7.3 GGACAT 7.28 CCCTTA 7.23 GATTTG 7.22 ATTTGT 7.2 ACATGT 7.19 CACGCT7.18 TGCTGG 7.14 CACCGA 7.05 ATCCGA 7.01 TAGTTC 6.93 CTGGAC 6.9 CCTCAC6.9 TGAATG 6.89 GCCCAG 6.83 CGGCTG 6.82 CTTGTA 6.77 AATCTC 6.73 AAGAAG6.68 GAATTG 6.67 AAGGAG 6.63 TAGGCT 6.62 TTTGTA 6.58 TTCTAA 6.55 TCTCAG6.51 ACCCTA 6.51 TTATGC 6.47 CTGGGA 6.46 TTTGGA 6.43 CTTTGC 6.39 GGAAAC6.38 AACCGA 6.33 ACGATG 6.33 GCTACG 6.32 CTTTAG 6.28 GCAGGC 6.25 CTGCCC6.22 TTCTTT 6.2 GCACTG 6.19 ATAGTC 6.11 GCTCAC 6.11 ATTGGT 6.09 GTACTG6.09 GGTATC 6.07 CCCAAA 6.05 CATTGT 5.96 GTGCAC 5.86 GTTTTA 5.81 GCAAAC5.79 CGCACC 5.79 CTACCG 5.78 GGGATA 5.77 ACAGGT 5.76 GCTGAG 5.75 AAATGT5.7 TGTAGT 5.67 TGATGG 5.64 ATGCCC 5.63 TTTCCC 5.63 GCCAAT 5.59 AAGGTA5.58 GTATCC 5.56 TGGACC 5.48 AGGCAT 5.46 GATGGT 5.44 TTCCTT 5.44 TGGAAG5.39 CCTATC 5.33 CGGACA 5.31 AGGGCT 5.22 TTTAAC 5.22 TTGTAA 5.21 ATAGGC5.18 TGTTAA 5.15 TGACTA 5.12 CCCCTA 5.11 AGATGT 5.1 GACAAT 5.09 GATCAA5.07 GCCAGC 5.05 TCATCC 5.04 AGTTAA 4.96 TCTCAC 4.95 ACGGCT 4.94 TCTATA4.87 GTAGGA 4.85 TTTCTA 4.85 CAGAGG 4.84 TTTTTG 4.77 TCCTAT 4.76 GAAGGC4.74 TCAGAC 4.73 GCAGCG 4.71 AGTGGA 4.7 CCACGC 4.69 TTGTTA 4.62 CTTAAA4.62 ACTGCG 4.61 GTTCAC 4.59 TCAAGG 4.58 AGGATG 4.56 CCCTGT 4.46 CAAAGG4.45 TTTAAA 4.39 TTATGG 4.38 CTAGAA 4.37 CCGTAA 4.36 TAGCCG 4.36 ACTTTG4.36 GACTGA 4.33 TCACAC 4.31 GGTAGA 4.27 GACTGC 4.25 AGATTG 4.24 CGGCTT4.23 ATGTCA 4.23 TCTTGA 4.2 CTTTTG 4.2 TGTAAA 4.2 GCTTTG 4.19 CCAAGT4.16 TGTACC 4.15 AAAGTT 4.14 ACCGTA 4.1 TACGAA 4.04 CTTATC 3.94 CCTCAA3.94 ACCCGA 3.93 GTTGAT 3.93 TGCTGT 3.92 GTTCAG 3.91 TGGTTA 3.91 AAAACG3.88 GCGCAG 3.86 CCTTTC 3.85 TCTCAA 3.85 ATCTAG 3.83 GAGATT 3.8 ACGACA3.75 TAGACT 3.73 TGTATG 3.7 GCTAGT 3.7 TAAGCC 3.7 AAAGGT 3.68 CTAAAT3.65 CAGTGT 3.61 GAGTTC 3.56 AGGGCA 3.54 CGCTTC 3.53 TACCGA 3.51 TCCTCA3.51 AGCAAG 3.5 GAAGCG 3.49 GCCTTA 3.43 TTAGTT 3.4 ACCGAC 3.39 GCAGGA3.39 ATGCGA 3.38 ACGAGC 3.35 GCAGGT 3.33 AGGGAT 3.33 CAGGGC 3.29 AAGGGA3.26 AGCGGC 3.25 GACCCT 3.25 CGCCAT 3.18 GTGAAA 3.17 AGAGGA 3.16 GGGATT3.16 ACGGAT 3.13 TGCTAG 3.1 TATGCG 3.06 GACCTG 3 TTGGAT 2.99 TACTTG 2.98GACAAG 2.95 TATGAG 2.93 GACTCT 2.87 GTTGTA 2.85 GTCACC 2.84 CATGTC 2.82TGGTAC 2.78 CTCCTT 2.78 ATCTGT 2.78 AGGACT 2.76 GGTAAC 2.76 TCCCAT 2.75CAATTT 2.73 GCTGGT 2.69 ACGATT 2.63 CGAACT 2.6 GACACG 2.58 ATGTGA 2.58CCTAAA 2.57 TGGCAT 2.49 CTGGTA 2.48 ACTTTC 2.47 GAGTAG 2.46 TTTCCT 2.4CCACAC 2.39 TGTTCA 2.38 AACTTA 2.38 TGTTGA 2.35 GAAAGG 2.33 ACGGCA 2.33GAGCCG 2.32 TCTTAG 2.32 CAATGT 2.29 GTCCAT 2.28 ACCGCA 2.24 CTCCTG 2.22CTAGAG 2.19 TCATTC 2.19 AAGGCA 2.18 CCCTTT 2.15 AGGTTC 2.11 CTTAAC 2.1TTGACC 2.07 GCTTTC 2.06 AGACAA 2.06 TTTCTG 2.02 GGTGAT 2.01 CCTCAT 1.99GAGAGC 1.95 GCCTTC 1.91 TGATGC 1.88 AGAGGC 1.87 GATGAC 1.87 GTTTCT 1.83TAACGA 1.8 CTTACC 1.79 ACTGAC 1.72 ACGCAA 1.7 CGAATC 1.69 GGACAG 1.64GCCGAT 1.64 TGGGAA 1.62 AGACGC 1.6 TTACCC 1.58 CAACCG 1.55 CCCTCC 1.51TTCAGG 1.48 TCACGA 1.48 TGCTTT 1.44 AGGGGA 1.42 ACGGAC 1.41 CTCCCC 1.38ACCTTG 1.35 AGAGTA 1.3 GCCAAA 1.29 AAAGTG 1.28 CCCCTG 1.21 TTGAAC 1.21GATGAG 1.21 GCGCTG 1.2 TCAATG 1.17 CTTGGA 1.16 AGGGAA 1.14 GTTGAA 1.14AGAGTT 1.08 AGACGG 1.08 TTGGAA 1.05 TCTCCC 1.02 CTCTAA 1.01 TCTGAG 1TCGATT 0.95 ACGAAT 0.83 TGGAGG 0.82 CATGGT 0.82 GAAGAG 0.81 TTCCTG 0.78CGCTTT 0.75 CGGAGA 0.75 GATAAG 0.72 GGCATT 0.71 GGCAGT 0.67 ATTCGA 0.67CATTTG 0.59 TCTTAA 0.58 ATTGAG 0.55 TTTTCC 0.54 CAAAAC 0.47 AGTGAC 0.47GCCTCC 0.45 GACGCT 0.39 CATCCG 0.39 CTATGG 0.38 TCATGG 0.37 GGGACA 0.36CCTGCC 0.36 CAGGGA 0.34 TTCGCA 0.32 AAGGTG 0.25 GATGTT 0.2 TTTTAG 0.18TGGTGA 0.16 CTGTGA 0.14 GGCAGA 0.11 GTGTAT 0.1 CCCTAA 0.09 TCTCCT 0.06ACTCGA 0.05 TACCTC 0 AATCGC −0.05 ACTTAA −0.05 CTCAAA −0.06 GCCCCC −0.1GGTTAA −0.11 GCGAAA −0.15 CTAGTT −0.16 TCCCCC −0.21 AACTTG −0.22 CTCCGC−0.27 AAACGA −0.29 TGCCCC −0.34 CGCTGC −0.35 AAAAGG −0.35 TGCATG −0.38CAGACG −0.39 TGACAC −0.39 CGATGA −0.4 TTAAGG −0.41 TTGGAG −0.41 GCCCAA−0.41 AGGTTA −0.42 ATTTAG −0.45 AGATTA −0.46 AGGTTT −0.49 GCCTAT −0.53TCATGC −0.55 CTCATG −0.58 CAGTCC −0.59 GTATGA −0.64 CCTCTA −0.65 CATTCT−0.65 CCGACA −0.73 AGTTAG −0.81 GCCAAG −0.86 ATTCTG −0.86 GAGTTG −0.88AAAGAG −0.91 TGTGCT −0.96 TCTAAG −0.96 AAACTT −1.01 GCGGCT −1.04 TTAGAC−1.04 TTAAAC −1.08 AAGGTT −1.14 AGTTGG −1.15 AGAGGT −1.2 CCCTAG −1.2CCGCTC −1.21 GCATGG −1.24 GCTAGA −1.26 ACGATC −1.27 CGTGCA −1.27 TTTAGC−1.32 CTCATT −1.33 CGCAGT −1.35 AATTGT −1.36 TGACAG −1.37 ATGCCT −1.38AAGTTC −1.41 CTTGAC −1.45 TTTTGA −1.48 ATAACG −1.48 GCATTC −1.49 ATCGGC−1.51 GTAATG −1.54 TAAACT −1.55 GAATGG −1.56 AATTTG −1.6 CCTCCC −1.61CGGATT −1.62 TTGAGA −1.64 GTGAAG −1.66 GCCCCT −1.7 CGTTTA −1.73 GAGGAA−1.76 CGTTCA −1.77 TTCGAA −1.81 ATCGAC −1.83 TTTTTC −1.87 TGCGCA −1.89ACCGAA −1.9 CTGCGC −1.93 AAGTCA −1.93 TTACGA −2 TGGACT −2.05 TACGCT−2.06 GAGGCA −2.1 TTGATG −2.12 ACCGAT −2.13 TACTCT −2.17 CGCCAG −2.18GAGTTA −2.18 CACGTT −2.2 CTGCGA −2.22 GTGCTT −2.23 AATGAG −2.24 AGTGTA−2.25 CTTATG −2.26 TCTCTG −2.27 CCTAAT −2.29 GGAATG −2.29 CCATTG −2.34CGATAA −2.35 ACTTGA −2.35 TTGGTA −2.35 TAGAGA −2.36 GACATT −2.38 GGGAAC−2.38 TGACAA −2.38 GTGCAG −2.42 CGGCTC −2.43 ATTGTT −2.45 ATGAGT −2.46GGATGA −2.48 GTTCTA −2.49 GTTAAA −2.5 ATGTTC −2.57 CCTAGC −2.61 CCCTAC−2.61 AGAATG −2.65 CGAAGC −2.7 CGGTAA −2.71 CTAATC −2.72 ACCTAA −2.76GCGCCA −2.8 GTCCCA −2.83 CGAGCA −2.88 TCAGGT −2.9 AGAGTC −2.92 GAGGAC−2.92 ACGAAA −2.95 AGGCAG −2.97 GGACCT −2.98 TCACTC −3.01 GACTGG −3.03CTTGAG −3.03 CGAGCC −3.07 GGCTGT −3.1 GCCGCT −3.11 GGACAA −3.11 TACCCT−3.12 GTCAGC −3.12 CTGTTC −3.18 CCGAAT −3.21 AGAGCG −3.21 ATGGTG −3.29TCCTTT −3.3 CATGAC −3.31 TAGACC −3.31 GGACTC −3.32 CCCTGC −3.32 GGAAGG−3.35 GGTTCT −3.38 GCAATC −3.41 AGTCTT −3.46 TACGGA −3.49 CGCACT −3.51GCCTGC −3.57 GGACCC −3.57 GCCTTT −3.58 TTTAGT −3.6 GGTGCT −3.6 CGACTC−3.65 GGATAG −3.69 GGATGC −3.7 ACTCTT −3.73 ATTGCC −3.84 TGAACG −3.84CTTTTT −3.89 GAATGC −3.91 TATAGG −3.92 GTATAG −3.93 GAGCGC −3.96 ATTGTG−3.97 TCAGAG −3.97 GGGATC −3.98 CCGCCA −4 TGTCCA −4.01 TGTTCT −4.03AGGCCA −4.04 CCTTAC −4.05 TTTTCT −4.07 CATCGA −4.09 AGCGAA −4.12 AAAGGG−4.12 GGGAGA −4.13 CTGAGT −4.13 GAAGTC −4.15 CGTAGC −4.16 CGGCAC −4.18TGCGAA −4.19 TCTTTT −4.21 ACGGAA −4.22 CCGACT −4.25 ACCTGT −4.26 ATCGTA−4.29 TATGGT −4.29 TAATCG −4.31 CGATTC −4.32 GGGAGC −4.38 CTCTAC −4.38CGTACA −4.41 CAAGTT −4.42 TAAGGC −4.46 AAGCGA −4.46 GGTACC −4.48 GACAGT−4.49 CCGCAA −4.53 GCTAAC −4.61 TCCCTA −4.62 CAGGTG −4.63 CAATGG −4.64TAGTCA −4.67 TAGACG −4.67 CGTGAA −4.7 AGACGA −4.7 AAGCGT −4.74 TGGGAT−4.81 CCGAAG −4.83 CGAAAA −4.87 AGCCCG −4.93 GGCTGG −4.96 GTCACT −4.99CAACGA −4.99 TGACCC −5 GCCGCA −5.04 GTTCAA −5.06 TCGCTG −5.07 GTGAAC−5.11 CCTTAG −5.16 ATAGGG −5.17 CAGTGC −5.18 AGGCGA −5.2 CGAACC −5.2ACTCCG −5.21 CTCCTC −5.24 GGTCCA −5.25 AAATTG −5.27 CAAGTC −5.27 TACCCG−5.28 CTTTCC −5.29 GCACTC −5.29 TTGGCA −5.3 ACTTGC −5.32 AGTCCC −5.32TGGCAC −5.33 GTGGAA −5.33 GGCCAT −5.36 GCGGAT −5.39 GCGCAT −5.4 GGGGAA−5.4 TCTAGA −5.4 ACTTGG −5.44 TGCGAT −5.45 GCGATA −5.45 TGCCCA −5.45TGGCTT −5.48 AGAGAG −5.48 TTGCTT −5.51 AATGTG −5.57 TTACGG −5.57 AAGGTC−5.59 TGAGAC −5.62 GACTGT −5.69 TTAGTG −5.71 CATTGG −5.71 CAGGTC −5.73TCCCTT −5.8 CGAATT −5.82 AATGTT −5.82 GGCAAT −5.83 TAGGAC −5.91 TACGGC−5.94 TCTTCT −5.95 GGGCTC −5.97 TCGCAT −5.98 CTAGGC −5.98 CCTTTT −5.99CCAGTG −6 CACGAG −6.01 TCCTAA −6.03 TAGGCA −6.08 TCTAAC −6.1 CACCCG−6.13 CTACGG −6.14 AGGTGC −6.16 CCCATG −6.17 ACGCCC −6.17 CGATCC −6.18GAAACG −6.2 ATGTGC −6.21 GCAAGC −6.24 AAATCG −6.25 CCTCTC −6.29 ACCGGC−6.31 TTTAGA −6.33 CGACTG −6.33 AGGCAA −6.33 GGACAC −6.35 TAGCCC −6.37TCTGGA −6.37 TAAAGT −6.37 TGAGTT −6.37 AAACCG −6.45 ACCCGG −6.51 CCTGAC−6.51 AAATTT −6.52 AACTCG −6.52 AAGGGC −6.52 TTTTGC −6.54 GGAAGT −6.61GGTTAC −6.66 TCGTAT −6.68 GTTCTC −6.7 GGAAAG −6.72 TCCTTG −6.72 GCGAAT−6.75 AGTCTC −6.77 GGCACT −6.8 GCTCTG −6.8 CTACCT −6.8 TTGACA −6.81AGCGTA −6.81 AGCGTT −6.83 TCGCAG −6.85 CGAAAT −6.88 GCCCTT −6.9 CATCGT−6.91 AATTAG −7 GACGAT −7 AACCGG −7.03 TTGCCA −7.04 CTAGCC −7.1 CACGGT−7.11 CTCTTC −7.13 AACGCC −7.15 GTTATG −7.15 ACTGTG −7.15 TAATGT −7.16CCAACG −7.21 GCCTTG −7.21 CCTTGT −7.23 TTCTGC −7.23 TAAGAC −7.23 GCTGTG−7.24 CCCCTC −7.25 GACTAA −7.25 CGCTCC −7.27 GCGACT −7.27 TTCCCT −7.28CGCCCA −7.29 TGCGCT −7.33 CCCCCC −7.34 TTAGAG −7.34 CCTGTT −7.36 TCTTCC−7.38 CCATCG −7.38 TCTAAA −7.39 CTAATT −7.42 AAGCGC −7.42 CTCTGT −7.48AGGCCT −7.48 TAGGAA −7.51 GTTCTT −7.54 GTATTC −7.63 ACGAGA −7.68 ATGTTT−7.68 GGGTAT −7.76 ACTAGG −7.77 CGGAAA −7.79 ATGGCC −7.82 GTTATC −7.85TCGACT −7.87 CTTCTT −7.9 AACGAT −7.91 GATCGA −7.94 CTCTAG −8 CTAACC−8.08 CTAGGA −8.08 GATTGT −8.1 CCCGCA −8.1 CGAGTA −8.11 TGGGCT −8.15GGCTTA −8.19 TCGGCT −8.24 GATGGC −8.25 ACGCAT −8.3 CCGCTT −8.3 TGGCTG−8.32 ACTCTC −8.35 GCCCAC −8.37 CGCTGG −8.37 TTGCTC −8.38 TGGTAG −8.38CTCTGA −8.49 TACTCG −8.59 TGAGAG −8.6 GCACCG −8.61 ATGGGA −8.69 TGACTG−8.7 CGATTT −8.72 CGGAGC −8.74 CGGATC −8.78 AGTCAA −8.79 TTCCTA −8.79CCTAGG −8.79 GTTGGA −8.82 AGTCTG −8.83 CAAGGT −8.85 AATTCG −8.91 ATTCGC−8.93 GAAAGT −8.99 CTAGAC −9 AACGAA −9.02 CGACAA −9.03 GCCTAG −9.04AAGTGC −9.06 GGTGTA −9.06 GATAGG −9.08 TTTGCC −9.1 TTTAAG −9.1 CCCCCT−9.1 CGATAG −9.13 ATCCCG −9.15 GTCACA −9.21 GTCCAG −9.24 CAAACG −9.25AGGCCC −9.29 AGGGAC −9.3 CTGACC −9.3 GCTGCG −9.34 TTTCTC −9.36 CGACAG−9.37 TGAGGA −9.41 CCAGGG −9.43 AGTCTA −9.43 GCCGAA −9.48 TCCCTC −9.49AAGTCT −9.51 AGGGCC −9.52 GCAGTC −9.54 ATGTTG −9.55 GTAAAC −9.56 GAGTTT−9.58 ATGCGC −9.6 CTCCCT −9.65 TTTTGG −9.67 GTCAAT −9.69 TAGGAG −9.7CTTCGA −9.71 AGTTTA −9.73 GTAAAG −9.78 CGCCAC −9.81 GACAGG −9.82 AGGAAG−9.83 ACGTAT −9.85 GAACGC −9.88 AAGAGT −9.91 CACTTG −9.92 GCGATT −9.92CGCCAA −9.93 GCTTAC −9.94 TGACTT −9.94 CATGTT −9.95 TGATTG −9.97 TCACGG−9.98 TCGAAT −9.98 CTCTTG −10.02 GTGATT −10.03 GAACGA −10.03 TGTTCC−10.05 TGTTTC −10.07 TCTTAT −10.08 GAGACG −10.09 CGGTTA −10.12 GCATGT−10.13 GGATGT −10.15 CCTTGG −10.18 GAATCG −10.2 GGGCTG −10.21 TAGAGT−10.25 TAGCGG −10.25 GCAGTG −10.25 GTCCAC −10.28 GAGTAC −10.33 CCACCG−10.36 CGACAT −10.37 GGGGAT −10.38 CGCTAA −10.4 CCGTTT −10.41 TCTAGC−10.42 GGGATG −10.45 CTGTGC −10.48 CTAAGG −10.48 TTGATC −10.5 ATTGGC−10.52 AGCCGT −10.56 ACTGGG −10.56 CTGGCT −10.58 ACGCCT −10.59 ATACGT−10.63 GGTAGC −10.65 TGTCAT −10.65 GATGCC −10.66 GGTTTA −10.7 GTGCTC−10.84 TAAGGA −10.86 CTTAAT −10.91 GATCCG −10.94 CGAGAT −11 GGCGAA−11.02 CCGCAT −11.03 GGCGCT −11.04 GCACGA −11.04 TGCCGA −11.07 GGCATC−11.1 TCGGCA −11.1 GATTAG −11.14 TCCTTA −11.15 CTAAAC −11.17 CGGAAG−11.23 CTTTGT −11.26 TTAGGA −11.27 CCGGAT −11.36 ATTAAG −11.38 GTGCTG−11.41 CTCTCC −11.45 TATTCG −11.47 GCCCTG −11.51 TCGCCA −11.54 TGTAGA−11.62 CTAGTG −11.62 CCGCCC −11.66 CAAGCG −11.66 GGTGGA −11.74 ATTAGG−11.75 GCCTAC −11.77 CTCACT −11.78 AAGCGG −11.79 AACCGT −11.81 AGATCG−11.85 TGACTC −11.92 TTCTTG −11.94 ATCGCC −11.99 ATCGAA −11.99 GGTTTT−12.02 TGGCAA −12.04 CGCCTT −12.06 TTGTAG −12.07 ACTTGT −12.08 TGGTTT−12.08 ACTCGG −12.09 TATGGC −12.1 TTGGTT −12.12 GCGATG −12.19 CAGGGT−12.2 AGTTTG −12.24 TAATCT −12.24 AAACGT −12.25 CGCAAT −12.28 CCCTCT−12.28 GGGCAT −12.33 AGTGGC −12.33 GCCAGG −12.34 TAAGGT −12.35 GGCCTT−12.37 GGGAAG −12.37 TGCCTA −12.4 CCGTCA −12.43 GTATTG −12.44 GTGACA−12.48 CGGCAG −12.51 TGTGAT −12.53 GACGCA −12.56 CAAGGG −12.58 GAGTCA−12.63 GCCGAG −12.66 CTTTCT −12.68 GACTTT −12.69 GGTCAA −12.72 TCGCAC−12.75 TCTTGC −12.82 CCTTTG −12.82 TTCGCC −12.88 TGGTCA −12.91 GCGCTC−12.95 GAAGTG −12.95 GCCTCT −12.96 AGGTGG −12.96 CAGTGG −13 GTACCC−13.02 TTCCTC −13.04 TCGACA −13.05 TGGCAG −13.07 CCGAAA −13.09 CTGCCT−13.11 ATGGGC −13.12 ACCGAG −13.13 CGTAGA −13.16 GGGCTT −13.18 CCGAGC−13.19 GACTTG −13.23 CTGACT −13.26 GAGGGA −13.28 AGTCGA −13.32 CCCGAG−13.32 CTTCCT −13.32 TCACGC −13.37 TAGGTG −13.39 CCTCTG −13.41 GCGACA−13.46 GCTAGC −13.46 TCATCG −13.48 CCCGCC −13.49 GTCCAA −13.5 TGGAGT−13.55 ACGAGT −13.6 CCCGGC −13.6 ACGGTA −13.65 TCCTGG −13.65 CGATCT−13.73 CAATCG −13.76 CTACGC −13.79 ATCACG −13.84 CGCTCT −13.89 CCCGAT−13.92 CGGTAT −13.94 AAGTCC −13.95 GGCATG −14 ATGAGG −14.05 AGGTCT−14.05 CTTAGT −14.06 ACTCGC −14.08 ACGAAG −14.09 GGGACT −14.1 AAGACG−14.11 TCCTGC −14.11 GCTCTC −14.12 TCTACG −14.14 TTGAGT −14.15 TCGAAC−14.16 CCCTTG −14.27 GTTCCT −14.28 GTCTCC −14.29 ACCGTC −14.35 TCTTTG−14.39 GGTTCC −14.39 GTTCCC −14.42 CGCTGT −14.51 CAACGT −14.53 CAGGCG−14.56 TACGCC −14.59 CGAAAC −14.6 TCTTTC −14.65 TGCCCT −14.67 GCCCTA−14.68 GTTTTC −14.68 GTATGC −14.7 GAAGGG −14.72 CGAAAG −14.79 GATTCG−14.79 CGATGC −14.9 TGAGCG −14.92 ACGCGG −14.93 CTCGAG −14.94 TGCGGA−14.95 ATGTCC −15.01 CGGCCA −15.02 ACCTAG −15.05 GTCAAA −15.06 GTGCCA−15.08 CCCCGA −15.11 CTGGCA −15.12 AAGGCG −15.13 GATTGC −15.14 TTTGAC−15.14 GTAGGT −15.16 GTTGTT −15.17 CCTAAC −15.17 GGACTT −15.18 CGTAAA−15.2 TCATGT −15.21 GGGACC −15.22 GGGCAG −15.23 CTGGTG −15.25 GGATGG−15.27 CCGTTA −15.31 GACGCC −15.32 CGCATC −15.33 ACGCTC −15.33 AAAGTC−15.35 GGGGCA −15.38 CTCGCA −15.38 GCACGG −15.39 AGCGAG −15.4 ACTGGC−15.44 CTGTCA −15.51 AGCGTC −15.52 GAGGAG −15.53 GTGTAA −15.58 TTGTAC−15.6 TCAGTG −15.65 GGCGCA −15.71 GCGAAC −15.71 TCTCTA −15.73 CCCGAA−15.75 TGAGGC −15.76 CCCCGG −15.78 CCTCGA −15.83 TATCGG −15.85 ATCCGT−15.86 AGCGGG −15.87 CCCACG −15.87 ACTGTC −15.88 GTTTAA −15.92 TAGTGG−15.97 AATGGG −15.99 ATCGAG −15.99 GTCCTT −16.01 AACGTG −16.03 CGCAAG−16.03 GGCCCT −16.05 CACGTA −16.06 TAGGGA −16.09 CGGCAA −16.12 CCTAAG−16.15 TCGAGA −16.16 GCCTGA −16.16 GACCCG −16.19 GTTAGA −16.27 TGCTTG−16.27 TCGAGC −16.29 ACGGTG −16.32 TCGATC −16.34 CAGGGG −16.36 GAATGT−16.41 TTGACT −16.46 TCAGTC −16.47 GCTCGA −16.48 AATCGT −16.48 GCCCTC−16.49 GACGGA −16.49 AAGAGG −16.52 CGTTAC −16.52 ATCCGG −16.55 TTATGT−16.55 CTTCGC −16.56 GAGTCC −16.57 GAGAGG −16.6 TGTCTT −16.61 AGAGTG−16.64 ACCCCG −16.65 TAACGG −16.65 CTCGGC −16.66 TAGAGG −16.67 CTTCCG−16.75 AACGGA −16.76 AAGTTT −16.76 GCCTGT −16.77 AGTCCT −16.79 GAACGG−16.8 GGCAAC −16.84 CTCGGA −16.85 TCGATA −16.85 ATGGGG −16.85 GGAGGC−16.88 CCGCCT −16.93 CCTCCT −16.95 AAGGGG −16.95 ACCGTG −17 GCCTAA−17.04 TGGGAC −17.08 TGGATG −17.08 TATCGC −17.09 GGACTA −17.1 CGAAGG−17.11 TCTAGT −17.14 GTCAAC −17.15 TTCTAG −17.16 CGAGAC −17.19 AAGGGT−17.2 GCGAAG −17.21 GCAAGT −17.22 CGGCCC −17.26 ATTTCG −17.3 GTGGTA−17.33 TGGTTC −17.37 GCATCG −17.37 GTACTC −17.39 ACGTAA −17.4 CTTGTT−17.4 GGACTG −17.41 GCCGTA −17.43 CCTGGC −17.44 AACGAG −17.52 CGCAGG−17.55 TCTTGG −17.58 AGACCG −17.62 TGCGAC −17.65 CAGTCG −17.66 GCCGTT−17.66 TAACGC −17.67 CGACAC −17.69 CCCGGA −17.72 GTCATT −17.72 ATCGGA−17.74 CCGAGT −17.76 GGTTTC −17.8 CGCATT −17.82 CCTTGC −17.83 TCTGCC−17.83 GCAAGG −17.84 CCCTGG −17.85 GTTTAC −17.87 AGGTCC −17.91 GCTTGT−17.93 CCGATC −17.95 TCGAAA −17.95 CTTGCC −17.99 TCCGAC −18 TATCGA −18GATTGG −18.08 CGTTAT −18.09 TATCGT −18.16 TTTCGC −18.18 AAGTGG −18.22GGCCCC −18.22 GGCCCA −18.24 ACCGGA −18.25 TCAGGC −18.26 CGTCTA −18.28GTCATC −18.3 GACCTA −18.31 TTGTTC −18.38 TCCTAG −18.4 ACCCGT −18.48ATCGTG −18.49 TTGGAC −18.49 CGGAAT −18.51 CAACGG −18.61 ACCGTT −18.62CCGTAG −18.63 TGCCAG −18.65 TGTTAG −18.77 CGACCC −18.77 TTGTTT −18.77TCGCAA −18.77 ATGGTC −18.81 CGTGGA −18.81 TCCTCT −18.83 TCGCCT −18.84TCGGAT −18.89 GCGACC −18.9 ACGTTT −18.91 TTAGCC −18.92 CTCTTT −18.92ACTTCG −18.95 CTATCG −18.96 GCGCAC −18.96 TCGAAG −18.97 TTATCG −19.01TAAGTG −19.03 TGATCG −19.03 CTCGAT −19.04 CTCGAA −19.13 CTCACG −19.18GGCTTG −19.19 CGCCGA −19.19 CTTGCT −19.24 GTAGTC −19.28 CACCGG −19.31TTTGTT −19.31 TCTGTT −19.32 TTTACG −19.34 GTCCCC −19.35 CGAGGA −19.35CGGATG −19.35 CCGATG −19.37 CATCGG −19.38 GGTAGT −19.38 CCGTGA −19.41TCCGCC −19.41 TCTCTT −19.42 GGAGAG −19.43 CATTCG −19.47 CGAATG −19.54TCTCTC −19.56 GGCCAA −19.57 GCTGTC −19.65 ACCGCG −19.66 GTGAGA −19.68GGCCAC −19.7 CCTAGT −19.71 TCTTCG −19.73 GTGATC −19.73 ATGTAG −19.77GTGACT −19.79 GACGGC −19.8 AGGGGC −19.83 ATTCCG −19.84 GTTTCC −19.85GGCAAG −19.96 CGGCAT −19.96 TCCCGC −19.96 AGTGTT −19.97 GCCGAC −19.99CCGATT −20.01 ATTCGG −20.03 TACCGT −20.03 TCAGGG −20.08 GTTTGA −20.11GCTCCG −20.13 CCTGGT −20.17 CCTCTT −20.18 ACGTGA −20.22 GTCTAA −20.25TAAGGG −20.27 TCCCCG −20.29 CACGTC −20.32 GGCAGG −20.33 CGTAAC −20.35GAGGCC −20.36 TAGTCT −20.36 AGGGAG −20.39 ACTCGT −20.39 CGCTTA −20.4GCGGAA −20.46 GGCTAA −20.5 CCTTCG −20.52 TAAGTT −20.52 TTGGCT −20.53CCGGAG −20.53 ACGCCG −20.58 GTCTCT −20.59 CCGAAC −20.66 AGGGTT −20.69GGTCAC −20.72 AGTGGT −20.73 TGTCAC −20.75 CCCCCG −20.77 TTCGAT −20.79CGTAGT −20.82 GCGGCA −20.82 TCCGAG −20.86 TCAAGT −20.87 CCGTCT −20.88GGAGGT −20.93 CTGACG −20.94 TGCCTC −20.94 AGTCAG −20.95 TTCTCT −20.97CGGTTC −20.97 TGTCTA −21 TCTCCG −21.04 CACTCG −21.05 TGACGA −21.15GTCTCA −21.17 GTCAAG −21.27 CTTGGC −21.28 ACGTCC −21.28 CGGTGA −21.32TTGGGA −21.4 TCGTAA −21.4 CGGAAC −21.42 GGTATG −21.43 ACCGGT −21.5CCGGAA −21.51 TCGTTA −21.53 AATGTC −21.55 CATGTG −21.55 GCGAGA −21.58TTTAGG −21.6 GAGGTA −21.69 CCGGCC −21.71 TGTGGA −21.72 CTCTCT −21.73GTGGCT −21.74 GCCGCC −21.76 GACCGA −21.76 GGTCAT −21.8 TCCCTG −21.84GCGCTA −21.87 TCCGTA −21.87 TTGTTG −21.9 GTCCTA −21.93 GCCACG −21.95TGCGTA −21.97 TCCGAA −21.99 GCCGGA −22.01 GAGCGG −22.07 TTCTCG −22.07GACGAA −22.08 CTGCCG −22.11 CTGGTT −22.11 AGGCCG −22.12 GAGTCT −22.25ATGGCG −22.26 GGGCAC −22.28 AGTCGC −22.31 GCGGAG −22.37 TCTCGA −22.4GACCGC −22.5 CTCGAC −22.51 ACGGGC −22.53 GCGCAA −22.56 CTCCCG −22.58GTATGT −22.6 GGGCAA −22.61 ATCTCG −22.63 AGTGCC −22.66 GTCTTC −22.66CGGGAA −22.68 CGATGT −22.69 GACTTA −22.7 CGCGCA −22.71 GACGAC −22.71GGGGCT −22.72 TCCCGA −22.78 TCAACG −22.81 CGTCAA −22.81 GATGGG −22.81TGCCTT −22.82 TACGGT −22.84 TTTGCG −22.87 CGCCCC −22.92 GAGGTC −22.93ATTGGG −22.97 CGGACT −22.99 AACGTA −23.02 ACGTTC −23.04 GACTCG −23.1CTGTTG −23.16 GCTGGG −23.19 CGTTTT −23.21 TACGAG −23.26 GCCAGT −23.28TTCGTA −23.29 CCTCCG −23.29 TTCTGG −23.3 GGGGAC −23.32 GATCGC −23.32CCCGAC −23.33 CGGGAT −23.34 GTGTTA −23.34 GTAGAC −23.35 GCAGGG −23.38AGAGGG −23.41 ACGGTT −23.42 CGCCTA −23.43 GGGCTA −23.43 GTACGA −23.43TTTGAG −23.44 TACGTA −23.44 GTGACC −23.47 CTCGCT −23.49 ATTGTC −23.58TTAAGT −23.58 TTACGC −23.64 GTTAAG −23.65 CCGGCA −23.74 AACGGT −23.77CGAGTT −23.8 GGCCGA −23.81 GCGGTA −23.82 GCAACG −23.84 GCGATC −23.9CTCCGA −23.94 CGGCCT −23.97 TCCGAT −23.99 AGACGT −24.01 TTTCGA −24.02TTTTGT −24.03 ATTCGT −24.04 TCACGT −24.05 CCTGGG −24.05 TGTAAG −24.09AATGCG −24.13 CGTTCT −24.15 CCGAGG −24.17 TCTAGG −24.2 TGGGTA −24.22GTGTTT −24.23 TGATGT −24.25 TAGGTT −24.27 ACTTAG −24.29 AACGTC −24.31AGGTTG −24.34 GTAGCG −24.4 GTTAAC −24.41 TATGGG −24.43 TAGCGC −24.44CGTCAC −24.48 TTCCGA −24.5 GACTAG −24.54 TGGGGA −24.57 GCGCTT −24.58TTCTGT −24.59 GGAGTC −24.6 CGCCTG −24.62 CGATTG −24.63 GGTGAC −24.68TCGTAG −24.68 TGTCAA −24.69 GGGTTC −24.7 TTCGAC −24.76 TGTGTA −24.79GAGTGA −24.81 GACGAG −24.82 CTAGGG −24.83 GTTGAG −24.87 TGACGC −24.87CGCAAC −24.94 CGCCTC −24.96 GAGCGA −24.96 CAAGTG −25.01 TGGTGC −25.01ACGGCG −25.02 CGAGGC −25.03 TACGCG −25.05 CATGGG −25.06 CTGTCC −25.07GTAAGC −25.1 CGTGTA −25.17 ATGCCG −25.2 ACGTAC −25.24 TTCCGC −25.28GTCTTT −25.31 TCCGGA −25.33 TTGTGA −25.35 AGTTCG −25.37 AGCGGT −25.47GCCCGA −25.51 CTGGGC −25.54 TAGTGC −25.55 TTGCCC −25.57 TCTTGT −25.63TGCGCC −25.65 CGAGAG −25.69 TATGTC −25.72 TGTCCT −25.75 AATCGG −25.77TTTCCG −25.78 TATGTG −25.8 TGGGCA −25.88 GCTTGC −26.03 TCGACC −26.05TTAGCG −26.06 CCGTTC −26.08 CTAACG −26.09 GGCGAT −26.11 GTTAGC −26.11GTGGCA −26.14 CCGGGA −26.15 GCCTGG −26.17 CTTAGG −26.18 AACGCG −26.19CGCGAA −26.21 ATCGTC −26.24 CTTGGG −26.27 GCACGC −26.29 GAGAGT −26.3GCATGC −26.3 ATCGTT −26.33 GAGGTG −26.36 TTAACG −26.36 CTGCGG −26.38ACGGGA −26.47 GGCCTG −26.49 CCTGCG −26.49 AGGGTA −26.49 GAACGT −26.5TTTGGT −26.53 ACGGAG −26.54 GGAGCG −26.73 CCGCCG −26.75 CCTACG −26.76GTAACG −26.81 CCCGTA −26.81 GCTTCG −26.82 TAGTCG −26.83 CGTCCA −26.91TGAGTC −27.01 CTCTGG −27.01 ATTGCG −27.01 CGATGG −27.05 GCTAGG −27.14GGGAGT −27.16 ATGTCT −27.2 CTGGGT −27.21 GGACGA −27.23 CGTTTC −27.24ATGACG −27.27 TTCGCT −27.27 AGGGTG −27.33 CTTCGG −27.4 CGAAGT −27.41TTGCCT −27.41 GGATCG −27.41 AGGCGC −27.45 GGGTTA −27.47 ACGCGC −27.48TTGTCA −27.57 TAGTGT −27.58 GAGGTT −27.6 TGTCTC −27.6 GTGATG −27.65GGTCCT −27.65 CGGACC −27.65 TCGCTT −27.66 TCGGAA −27.69 ACGTCA −27.74TTCCCG −27.84 GCACGT −27.87 GTCGCA −27.88 CGTTAA −27.93 ACCTCG −27.95TGGGAG −27.96 CTGTGT −27.97 TAGCGT −28.06 AGGACG −28.08 GGCCTC −28.1AGTACG −28.14 TAAGCG −28.21 CTGCGT −28.23 TGTGTT −28.25 GGGTAA −28.26TTTTCG −28.33 GCGTTT −28.33 TCTCGG −28.34 GCGGAC −28.36 CGACTT −28.38CGACGA −28.4 GTTAGT −28.44 CCTCGC −28.53 TTGCGA −28.62 GTCGCT −28.65GTTCTG −28.7 CGCGGA −28.75 GACGTA −28.8 ATGTGT −28.81 CCGCGT −28.84TTAGGC −28.88 CTTTGG −28.94 TACCGG −29 GTAAGT −29.01 ACGAGG −29.02ACGTAG −29.02 TGGCTC −29.02 GCTTGG −29.05 ACGTTA −29.06 AGGAGG −29.07TGACGG −29.11 CCACGT −29.16 CGTATG −29.17 CGGGCA −29.21 AACGGG −29.25CTCCGG −29.27 GGGCCA −29.28 CGTACC −29.28 CCGTAC −29.41 CGTACT −29.46CTGTCT −29.48 TGCCTG −29.64 CTGTGG −29.64 TGGTTG −29.7 GGTTGA −29.72GAGGGC −29.76 TTCGGC −29.83 GGTTGC −29.89 TCTGGT −29.9 CCTCGG −29.96GTTGAC −30 TTGACG −30.03 AACGTT −30.07 CCGACC −30.12 GGGTTT −30.13GTCTAC −30.13 ACGACG −30.19 CGGGCT −30.2 GTAGAG −30.23 GGAACG −30.3GTCTTA −30.3 GCTGGC −30.31 CGTGCT −30.39 CTACGT −30.41 CTTTCG −30.42TGTCCC −30.46 CGGTAG −30.52 TCTGCG −30.54 TCGATG −30.54 TCGGAC −30.58TCCGGC −30.61 TTCGAG −30.64 CCCTCG −30.66 CCGCGA −30.67 ACGTGC −30.67GGCCTA −30.68 CCGGAC −30.7 GCGTAA −30.77 GTCCTC −30.77 TTCGGA −30.82CCCCGC −30.82 AGCGCG −30.84 CTCGCC −30.85 GGCTAG −30.87 CTTACG −30.96GATGTC −30.96 GGACGC −30.98 ACGTCT −30.99 TGTCAG −31 ACGCGA −31.01GTTCGA −31.06 TTGAGG −31.08 TCGTAC −31.09 TTAGGG −31.1 TGCCGC −31.12TAGGCC −31.12 CCCGGG −31.15 CGACCT −31.16 CGAGTC −31.3 TCTGAC −31.36GTCCCT −31.46 TCTGGG −31.48 CGCGTA −31.55 TGAGGG −31.55 CGGGGA −31.59CGACGC −31.63 TGAGGT −31.63 TTTGGC −31.64 CGTCAG −31.68 GATGTG −31.69TGGCGA −31.75 GTGAGC −31.75 GTCGTA −31.76 TCTGGC −31.78 GTGTCT −31.81GCGTCA −31.86 GCGCCT −31.88 CCTGTG −31.89 AGTCGT −31.89 TCGGTA −31.95CCGGTT −31.95 CGCGCT −31.96 CTTGGT −31.98 TTACGT −32.02 GTGTAC −32.06CGCTTG −32.07 CCGACG −32.12 CCGTGT −32.13 GTATCG −32.13 TTAGTC −32.13TCGGCC −32.14 CATGCG −32.19 GTCAGA −32.21 ACGTTG −32.23 CGCATG −32.23TCCTGT −32.37 GCGAGC −32.37 ACGTGT −32.41 ATGCGG −32.41 TGGTCC −32.44ATGGGT −32.58 CGTCTC −32.61 TGTGCC −32.64 CTTGTC −32.65 GGCGGA −32.72GTCTGA −32.74 CTGGTC −32.79 GGGGGA −32.83 AGGGTC −32.86 TAGTCC −32.91CGGGTA −32.94 GCGTTA −32.98 GACCGT −33 GATCGT −33.15 ATCGGT −33.22CACGGG −33.24 GACGTT −33.25 CACGCG −33.27 GGTAAG −33.36 GTCGAT −33.37GATGCG −33.38 GGACCG −33.38 GCCCCG −33.46 GCGGGA −33.56 GGTCCC −33.6GTATGG −33.62 CCCGTT −33.63 CGCGAT −33.69 CCGTGC −33.75 GACGGT −33.89CGACCG −33.91 CCTGTC −33.96 GTAGTG −34.01 GGGTCA −34.05 TAGGGC −34.19GTTACG −34.32 AGGTAG −34.33 GGCCGC −34.45 GCGGCC −34.5 GCCTCG −34.53CGAACG −34.71 GTCGAA −34.72 CGTCCC −34.81 CTAAGT −34.82 CCGGTA −34.83GTCTGC −34.87 TCGTGA −34.87 CGGAGT −34.91 GGGTAC −34.91 GTGGAC −34.93ACGGTC −34.94 CTCGTA −34.95 TCGAGT −35 TCTGTG −35 GGTTGT −35.09 AGGGGT−35.15 TACGTT −35.18 TCGTCA −35.34 AAGTGT −35.39 TGTAGG −35.44 GCGGTT−35.48 TACGTC −35.52 TGTTGC −35.56 TTGGTG −35.57 AGCGTG −35.58 CTGGCG−35.58 TGTACG −35.8 CGTCAT −35.87 TCCTCG −36.02 GGGCCT −36.04 CTAGTC−36.07 TGTTGG −36.07 GTGGAG −36.09 GGCCGT −36.15 GTTTGC −36.17 CCCCGT−36.18 GTGGTT −36.22 CGCCCT −36.23 TCGCCC −36.23 GATCGG −36.23 TGACCG−36.25 GGGTGA −36.29 TTCCGT −36.3 ATCGGG −36.36 TCCCGG −36.42 TGGCCA−36.53 GTGTAG −36.53 ATGCGT −36.65 GCCCGC −36.69 TGGCGC −36.74 GTGGGA−36.74 TGTTCG −36.88 TGGCCT −36.92 GGTCTA −36.94 TGCGGC −36.96 CGTGAC−37 TAACGT −37.18 TCGTTT −37.19 CGCTAG −37.2 CGGCCG −37.2 CTTGCG −37.21AGGCGG −37.21 CGTTGA −37.28 TGTTTG −37.33 GTAAGG −37.38 CGGACG −37.41CGCCGT −37.43 CGGAGG −37.44 CGTTCC −37.45 TGCGAG −37.46 GTTGGT −37.56TTTGTC −37.57 GAGGGT −37.59 TAAGTC −37.59 GGCTCG −37.63 GACGCG −37.63GGGTCT −37.66 TCGCTC −37.67 TCTCGC −37.72 TTTGTG −37.74 ATCGCG −37.75GGGGTT −37.76 GTCACG −37.82 GGTCTT −37.95 CCCGTC −37.96 CTAGCG −37.99CGCACG −38.02 TTAGGT −38.03 CGGGAG −38.06 GTTTAG −38.11 GCCCGT −38.11GTCCGA −38.11 AGTGAG −38.2 CTTGTG −38.23 TCGAGG −38.24 TTGGCC −38.25AGTCCG −38.38 CGGTTT −38.45 TGCGTT −38.48 CGTGAT −38.53 GCGTTC −38.53TTGGGG −38.54 GGTTTG −38.55 CGGTAC −38.57 TGGCCC −38.57 GCTCGC −38.62ACGCGT −38.63 TGTGAC −38.71 GACCGG −38.72 GCGCCC −38.73 ACCGGG −38.81GGTGCC −38.81 TTGTCC −38.88 TTGCCG −38.89 ACGGGT −38.92 ATGTGG −39GGTCTC −39.04 CGTAAG −39.11 TTCGTT −39.12 TACGGG −39.13 GTCATG −39.17GGACGT −39.21 CGGGAC −39.25 TGGGTT −39.32 GAGTGC −39.35 CTCTCG −39.4CGCGAC −39.42 TAGGGG −39.5 GGCACG −39.52 CCGCGC −39.55 TGGACG −39.58GGCGAC −39.6 CTGGGG −39.69 CGGGTT −39.73 GTGCCT −39.76 TTGGGC −39.77GCCGTC −39.8 GGCCAG −39.84 CCGTCC −39.9 GCGCGA −39.9 CGCGGC −39.95TCGGGA −39.98 GTCTTG −40.04 AGTGGG −40.12 CCGGGG −40.16 TTTGGG −40.17CTTCGT −40.23 CGGTCA −40.24 CACGTG −40.28 GGTGAG −40.43 GTCGAC −40.48TGCTCG −40.49 TGGGGC −40.62 GGAGGG −40.68 TGTGAG −40.75 GGGGCC −40.89GGTGGT −40.9 AGGCGT −40.91 TCCGTC −40.92 TCCGCG −40.92 GTACCG −41.02AGGTGT −41.07 GCTCGT −41.08 TTTCGT −41.14 TGCCCG −41.17 CGATCG −41.22CGTCTT −41.34 TTCCGG −41.5 GTTTTG −41.52 GCGAGT −41.58 GGGGAG −41.63CTAGGT −41.64 CCGTGG −41.64 GAGGCG −41.68 CCGCGG −41.7 TTGTGC −41.74TTTCGG −41.75 GCGTAC −41.83 GTACGC −41.88 GAGTCG −41.9 TCCGTG −42.01CGGCGA −42.01 CTCCGT −42.07 TTGCGC −42.08 GTGCCC −42.11 GCGTGA −42.12GTAGGC −42.16 CTCGTT −42.19 GTGCGA −42.23 AGTGCG −42.39 CGCCCG −42.43GGCGTA −42.43 GAGCGT −42.48 TCGTTC −42.53 AAGTCG −42.67 GTCAGG −42.73CGTTAG −42.84 TCGGTT −42.92 TCGCGA −42.92 GGGAGG −42.93 GGGACG −42.94GTCAGT −43.2 TGCCGT −43.2 GGGGTA −43.2 GCGTCT −43.23 GCCGCG −43.26AGTCGG −43.28 TCCGTT −43.36 CTCGGG −43.37 GGGCCC −43.5 TAGGCG −43.53GGTCAG −43.58 GGGTAG −43.61 TACGTG −43.67 GTCCTG −43.69 CGCCGC −43.8CTGGCC −43.85 TAGGGT −43.88 GCCGGG −43.92 GGTTAG −43.96 CCGGGT −44.07CTCGTC −44.16 GTCTAG −44.19 GGTGTT −44.21 CCGGGC −44.24 AGTGTG −44.25CGAGGG −44.3 GTTGGC −44.3 CGGCGC −44.3 AGTGTC −44.31 CGTTGT −44.33GTTCGC −44.35 GTTGCC −44.36 GGCGGT −44.48 TTCGCG −44.51 TTGCGG −44.57GTGTTC −44.64 ATGTCG −44.73 GGCGGC −44.81 TCGGAG −44.82 GACGGG −44.82CGTCCT −44.86 TCGACG −44.94 GGACGG −44.99 TGGTGG −44.99 TCCCGT −45.06TGTCGA −45.08 GCTCGG −45.1 GGGCCG −45.15 GTTTGT −45.27 GAGGGG −45.32TTCGTG −45.45 GCGAGG −45.47 CCTCGT −45.53 GCCCGG −45.6 GTCTGT −45.62TTGTCT −45.66 CGGTGT −45.71 CGTTTG −45.74 GGTAGG −45.84 GTCCGC −45.88GCCGGC −45.88 CGCGTT −45.93 ACGGGG −45.94 CCGTTG −45.97 TCTGTC −46.05GGGCGA −46.06 GACGTG −46.08 TTGGTC −46.16 GCCGGT −46.32 TTGCGT −46.32GGGTTG −46.34 GGCGAG −46.43 CGTGTT −46.44 GGGTCC −46.51 TGGGCC −46.53GCGTTG −46.58 CGACGG −46.68 AGGGGG −46.69 GTGTCA −46.75 GCGTAG −46.76TAGGTC −46.77 CGCGAG −46.79 TGAGTG −47.04 GACGTC −47.04 GTCGGA −47.14GGTTGG −47.18 TCGCGC −47.26 GCGCCG −47.28 TGTCTG −47.32 GCCGTG −47.35CGTTGC −47.38 TCGTCT −47.39 GGCGCC −47.47 GGGGGC −47.53 TTGTGG −47.6CGGTCC −47.61 CGCGCC −47.71 TTCGGT −47.79 TGACGT −47.8 TGTCCG −47.88TGTTGT −47.91 CCCGGT −48.15 GCGTCC −48.17 TCCGGG −48.25 CCCGCG −48.5TCGTCC −48.56 GTTCGT −48.56 GTTCCG −48.59 TTGGCG −48.69 TGGCCG −48.69GCGACG −48.74 GGAGTG −48.78 GTTAGG −49.18 GGCCGG −49.22 CGGTTG −49.22TCTCGT −49.23 CGAGCG −49.24 CGAGGT −49.43 CGTCTG −49.43 CTCGGT −49.55TTCGTC −49.6 GGCGTT −49.72 TCGGCG −49.79 CGTCGA −49.86 GTGACG −49.87CGACGT −49.9 GGTACG −49.96 CGGTGC −50.01 GTACGG −50.02 CGTGAG −50.26CGGCGG −50.36 TTGTGT −50.37 GAGTGG −50.58 TTCGGG −50.66 TGTGTC −50.68TGGGTC −50.7 GGTCGA −50.8 GTTTGG −50.88 CCCGTG −51.09 GTTTCG −51.17CGAGTG −51.21 GAGTGT −51.21 TGGTGT −51.29 TCGGGC −51.42 TGCGCG −51.46TCGCCG −51.51 CCGGTC −51.68 CGCGTC −51.71 GTCGTT −51.72 TGGTCT −51.83CGCGGT −51.85 GGTCTG −51.86 CTCGCG −51.88 CTCGTG −51.94 CGGGCC −52.44GTACGT −52.73 AGGGCG −52.77 GTGCCG −52.81 GTGAGT −52.89 TGTGGT −52.99CGGTCT −53.11 TCGTGG −53.14 CGGGGC −53.26 TCGTTG −53.27 ACGTGG −53.35GGTTCG −53.38 ACGTCG −53.48 GGCCCG −53.53 CGTGCC −53.55 TGGGGG −53.57CGGCGT −53.63 CGTAGG −53.63 GTCTGG −53.69 GTGAGG −53.7 CGTACG −53.71GTTGTC −53.73 TTGGGT −53.74 CGTCCG −53.8 TGCCGG −53.82 TCGTGC −53.92CGGGTC −54 GTCGAG −54.01 CGTTGG −54.19 CCGGCG −54.27 TCCGGT −54.32GCGGGT −54.37 TCGTGT −54.38 CGCCGG −54.53 CGCTCG −54.55 GTCCGT −54.62GGTGGC −54.7 TGCGTC −54.83 GGGTGC −54.96 GTCGCC −55.39 TGTGCG −55.49CGTGTC −55.5 GGCGTC −55.61 GCGCGC −55.66 CTGTCG −55.74 GTCCCG −56.36GCGGGC −56.49 GTAGGG −56.76 TGTCGC −56.8 TCGCGG −56.94 TGGCGT −57.03GTGCGC −57.04 TTGTCG −57.09 GTGTTG −57.15 TGGGGT −57.19 GGTCGC −57.25CGTGGC −57.9 GGGCGC −58.19 TGCGGT −58.27 TGGCGG −58.3 GGGGGT −58.38TCGGGT −58.51 CCGGTG −58.6 CGTTCG −58.67 TCGGTC −58.82 GTCGGC −58.88GTGGTC −58.88 GTGTGA −59.14 CGTGGT −59.24 GTGGCC −59.29 GCGGTC −59.3GCGCGT −59.36 AGGTCG −59.5 GTCTCG −59.51 GGTGTC −59.7 TGGTCG −59.72GCGGTG −60.02 TGCGTG −60.04 GTGTGT −60.05 GGGGTC −60.15 CGCGCG −60.19TGGGCG −60.25 GCGTGT −60.27 GTTGGG −60.36 TGCGGG −60.39 TGTGGC −60.71GCGCGG −60.73 CGTCGC −60.8 CCGTCG −60.85 GTGGTG −60.86 GTTCGG −61.52GGGCGG −61.53 TCGCGT −61.64 GTGTCC −61.73 GGGTGT −61.79 GGGGGG −62.06TGTGTG −62.08 GCGTGC −62.32 CGGGGG −62.44 CGGGCG −62.52 GGCGTG −62.89TCGGTG −63.03 GGCGGG −63.07 GTTGTG −63.22 GGTCGT −63.3 TCGGGG −63.6GTTGCG −64.3 GGGCGT −64.62 TCGTCG −64.83 GGTCCG −64.88 GCGGCG −64.99GTGCGG −65.11 GGTGCG −65.21 GCGTGG −65.85 GGGGCG −66.57 CGCGTG −66.73GTGTGC −66.98 GTCCGG −67.1 GTGCGT −67.14 TGTCGT −67.26 TGTGGG −67.31CGGTCG −67.35 CGGGGT −67.36 CGCGGG −67.6 TGTCGG −67.61 CGTCGG −68.18GGCGCG −68.24 GGGGTG −68.68 CGTCGT −68.69 GTCGGT −68.84 TGGGTG −69.08GTCGTC −69.14 GCGTCG −69.26 CGGGTG −69.69 GGGTGG −69.98 GTGGGC −70.27CGTGTG −71.38 CGGTGG −71.52 CGTGCG −71.83 GCGGGG −72.46 GTGGGT −73.21GTCGCG −73.55 GTCGTG −73.94 GTGGCG −73.94 GTGGGG −74.96 GGTGGG −75.37CGTGGG −75.74 GGGTCG −76.6 GTCGGG −80.38 GGTCGG −81.93 GGTGTG −82.57GTGTCG −84.85 GTGTGG −90.52

Another application of the rank ordering of all motifs for their abilityto increase or decrease SHM-mediated mutagenesis is the creation of geneconstructs that are colder or hotter relative to wildtype sequence. Anysequence position in a gene can be evulated for an equivalent sequencewith a hotter or colder (relative to the starting or unmodifiedsequence) motif consistent with the amino acid according to the z-scoreshown in Tables 2 and 3. So while no best preferred hot spot or coldspot motif from Table 7 may be found to substitute at a particularsequence position, a relative improvement in the SHM properties of thereplacement motif may almost always be made where there is an underlyingdegeneracy in the codon sequence. A log-odds based score of the observedto expected mutation frequencies at each motif might also be used toscore a tile-library for a polynucleotides cumulative hotness orcoldness to SHM.

Thus, one can appreciate that the replacement of any SHM motif with onethat has a greater probability of SHM, mediated mutagenesis can resultin a sequence that is more susceptible to somatic hypermutation, andthat the replacement of any SHM motif with one with a lower probabilityof SHM mediated mutagenesis can result in a sequence that is moreresistant to somatic hypermutation.

Tables 2 and 3 shows the 3-mer, 4-mer, and 6-mer motifs ranked byz-score for their ability to attract SHM-mediated mutation. Anotherpossible representation of hot and cold hot spot motifs can be made byconstructing a position specific matrix from the assembly of motifsrepresented amongst the highest and lowest scoring z-scores. Below is asingle example of a 6-mer motif overrepresented amongst the top scoresmotif “hot spots,” given as a position specific matrix in Table 4:

TABLE 4 Posi- Posi- Posi- Posi- Posi- Posi- tion 1 tion 2 tion 3 tion 4tion 5 tion 6 A 0.0 1.0 0.0 0.0 0.2 0.5 C 0.75 0.0 0.0 1.0 0.0 0.0 G 0.00.0 1.0 0.0 0.0 0.5 T 0.25 0.0 0.0 0.0 0.8 0.0

In one non-limiting example, the term “preferred hot spot SHM codon” or“preferred hot spot SHM motif” refers to a codon or motif, including butnot limited to codons TAC, TAT, or AGT, AGC, potentially embedded withinthe context of a larger hot spot motif which recruits AID-mediatedmutagenesis and generates targeted amino acid diversity at thatposition. SHM introduces specific nucleotide transitions at eachposition of a “hot spot” motif with a frequency that can quantified.This spectrum of nucleotide transitions results in different possiblesilent or non-silent amino acid transitions is dependent on which of thethree possible reading frames is used. By defining the most likely codontransitions mediated by SHM and the sequential flow mutation events,“preferred hot spot SHM codons” or “preferred hot spot SHM motifs” canbe chosen in such a way as to generate a specific panel of amino acidtransitions that suit the functionality of a library at each amino acidposition, as described in Section V.

IV. Polynucleotide Design Strategies for SHM

Provided herein is a method to design nucleotide templates to eithermaximize or minimize the tendency of a polynucleotide to undergo SHM,while at the same time maximizing protein expression, RNA stability, andthe presence of conveniently located restriction enzyme sites.

Also provided herein are synthetic versions of a polynucleotide that arealtered to either enhance, or decrease the impact of SHM on the rate ofmutagenesis of that polynucleotide compared to its wild type'ssusceptibility to undergo SHM (i.e., SHM susceptible or SHM resistent).

The SHM susceptible sequences facilitate the rapid evolution andselection of improved mutant versions of proteins and the systemcombines the power of rational design with accelerated randommutagenesis and directed evolution.

Conversely, the SHM resistant sequences facilitate the rapid evolutionand selection of improved mutant versions of proteins and the systemcombines the power of rational design with decreased random mutagenesisand directed evolution

Also included in the invention are SHM resistant polynucleotidesequences that allow for conserved domains to be resistant toSHM-mediated mutagenesis, while simultaneously targeting desiredsequences for increased susceptibility to SHM-mediated mutagenesis.

Polynucleotides for which these methods are applicable include anypolynucleotide sequence that can be transcribed and a functional assaydevised for screening. Preferred polynucleotide sequences include thoseencoding proteins, polypeptides and peptides such as, for example,specific binding members, antibodies or fragment thereof, an antibodyheavy chain or portion thereof, an antibody light chain or portionthereof, an intrabodies, selectable marker genes, enzymes, receptors,peptide growth factors and hormones, co-factors, and toxins.

Other non-limiting examples of molecules for use herein includepolynucleotides that have enzymatic or binding activity without the needfor translation into a protein or peptide sequence, such polynucleotidesincluding for example, enzymatic nucleic acids, antisense nucleic acids,triplex forming oligonucleotides, 2,5-A chimeras, RsiNA, dsRNA,allozymes, abd aptamers.

Biologically active molecules of the invention also include moleculescapable of modulating the pharmacokinetics and/or pharmacodynamics ofother biologically active molecules, for example, lipids and polymerssuch as polyamines, polyamides, polyethylene glycol and otherpolyethers. For example, polypeptides are those such as, for example,VEGF, VEGF receptor, Diptheria toxin subunit A, B. pertussis toxin, CCchemokines (e.g., CCL1-CCL28), CXC chemokines (e.g., CXCL1-CXCL16), Cchemokines (e.g., XCL1 and XCL2) and CX₃C chemokines (e.g., CX₃CL1),IFN-gamma, IFN-alpha, IFN-beta, TNF-alpha, TNF-beta, IL-1, IL-2, IL-3,IL-4, IL-5, IL-6, IL-7, IL-10, IL-12, IL-13, IL-15, TGF-beta, TGF-alpha,GM-CSF, G-CSF, M-CSF, TPO, EPO, human growth factor, fibroblast growthfactor, nuclear co-factors, Jak and Stat family members, G-proteinsignaling molecules such as chemokine receptors, JNK, Fos-Jun, NF-κB,CD40, CD4, CD8, B7, CD28 and CTLA-4.

One strategy for altering the ability of a polynucleotide to undergo SHMis through modulating the frequency and location of hot stops within thepolynucleotide sequence of interest. The position or reading frame of ahot spot or cold spot is also an important factor governing whether SHMmediated mutagenesis can result in a mutation that is silent withregards to the resulting amino acid sequence. Both the degree of SHMrecruitment and the reading frame of the motif are considered in thedesign of hot and cold spots.

An optimized polynucleotide sequence has been made “susceptible for SHM”if the polynucleotide sequence, or a portion thereof, has been altered,or designed, to increase the frequency and/or location of hot spotswithin the open reading frame and/or has been altered, or designed, todecrease the frequency and for location of cold spots within the openreading frame of the polynucleotide sequence compared to the wild typepolynucleotide sequence.

Conversely, an optimized polynucleotide sequence has been made“resistant to SHM” if the polynucleotide sequence, or a portion thereof,has been altered to decrease the frequency and/or location of hot spotswithin the open reading frame of the polynucleotide sequence, and/or hasbeen altered, or designed, to increase the frequency and for location ofcold spots within the open reading frame of the polynucleotide sequencecompared to the wild type polynucleotide sequence.

One can target specific regions of a polynucleotide for optimization ofeither hot spots, cold spots or both. In one embodiment, regions of apolynucleotide can be made hot (e.g., ligand binding, enzymaticactivity, etc.) while other regions (e.g., those needed for structuralfolding, conformation, etc.) of a polynucleotide can be made cold. Forexample, it has been observed that in antibody genes, the codon usageand precise concomitant hot spot/cold spot targeting of AID activity andpol eta errors has evolved under selective pressure to maximizemutations in the variable regions and minimize mutations in theframework regions.

A polynucleotide sequence can be prepared that has a greater or lesserpropensity to undergo SHM mediated mutagenesis by selectively alteringthe codon usage to modulate SHM hot spot and/or cold spot density. Basedon this information, it is possible to optimize particular regions of apolynucleotide that appear to be directly involved in a functionalattribute of a protein encoded by the polynucleotide. In onenon-limiting example, nucleotides to be optimized can encode amino acidsthat can lie within, or within about 5 Å of a specific functional orstructural attribute of interest. Specific examples include, but are notlimited to, amino acids within CDRs of antibodies, binding pockets ofreceptors, catalytic clefts of enzymes, protein-protein interactiondomains, of co-factors, allosteric binding sites, etc.

SHM hot spot and cold spot motifs have been described elsewhere hereinsuch as, for example, in Tables 2, 3 and 7. Non-limiting examples of4-mer nucleotide hot spot motifs that can be used for this purpose,include for example, nucleotide sequences of TACC, TACA, TACT, TGCC,TGCA, TGCT, AACC, AACA, AACT, AGCC, AGCA, AGCT, GGTA, TGTA, AGTA, GGCA,TGCA, AGCA, GGTT, TGTT, AGTT, GGCT, TGCT and AGCT, and theircomplementary sequences encoded on the alternative DNA strand. Exemplary3-mer cold spot motifs include, for example, nucleotide sequences ofCCC, CTC, GCC, GTC, GGG, GAG, GGC and GAC.

An additional consideration is the reading frame of the hot or coldspot. Hot spot motifs can be situated relative to the reading frame suchthat SHM-mediated mutation is more likely to occur at the wobbleposition of a codon (the third position), making a silent mutation morelikely to result. Conversely, a hot spot motif can be situated relativeto the reading frame such that a non-silent mutations result fromchanges in codons (data not shown). As discussed below, these designparameters can be conveniently optimized using an iterative computeralgorithm.

In addition to optimizing hot spot and/or cold spot motif density, itcan also desirable to consider the following characteristics such thatoptimized polynucleotides are efficiently translated, and stable in ahost system.

The density of CpG dinucleotides motifs: Excessive CG motifs can resultin gene methylation leading to gene silencing, and can be normalized tothe density found in highly transcribed gene in the host system inquestion (see for example, Kameda et al., Biochem. Biophys. Res. Commun.(2006) 349(4): 1269-1277).

The ability of single stranded sequences to form stem-loop structures:the formation of stem-loop structures can result inefficienttranscription and or translation, particularly when located near the 5′region of the coding frame (see, e.g., Zuker M., Mfold web server fornucleic acid folding and hybridization prediction. Nucl. Acid Res.(2003); 31(13): 3406-3415). Stem loop structure formation can beminimized by avoiding repetitive or palindromic stretches of greaterthan 6 nucleotides, for example, near the 5′ end. Alternatively, longerstems are acceptable if the loop contains greater than about 25nucleotides (nt).

Codon Usage: Appropriate codon usage, i.e., the use of codons thatencode for more common and frequently used tRNAs, rather than very raretRNAs, is important to enable efficient translation in the expressionsystem being used (see generally Nakamura et al., Nuc. Acid. Res. (2000)28 (1): 292, “Codon usage tabulated from international DNA sequencedatabases: status for the year 2000;” which includes codon frequencytables of each of the complete protein sequences in the GenBank DNAsequence database as of 2000). Generally codon usage is more importantnear the 5′ end of the gene where transcription of the polynucleotidebegins, and rare codons should be avoided in this region where everpossible. Preferred is the elimination of about 80% or more, of thecodons that are used less than 10% of the time within the coding frameof the expressed genes in the organism of interest.

GC content: Generally this should be matched, to the GC content ofhighly expressed genes in the host organism, for example in mammaliansystems GC content should be less than about 60%.

Restriction sites: Restriction sites should be placed judiciously wheredesired. Similarly, important restriction sites (i.e. those that areintended to be used to clone the entire gene, or other genes) within apolynucleotide should be removed where not desired by altering wobblepositions.

Stretches of the same nucleotide: Minimize or eliminate stretches of thesame nucleotide to less six (6) contiguous nucleotides.

In addition, expression can be further optimized by including a Kozakconsensus sequence [i.e., (a/g)cc(a/g)ccATGg] at the start codon. Kozakconsensus sequences useful for this purpose are known in the art (Mantyhet al. PNAS 92: 2662-2666 (1995); Mantyh et al. Prot. Exp. & Purif.6,124 (1995)).

Avoid, or minimize the usage of certain codons (“Non preferred SEMcodons”) that can be mutated in one step to create a stop codon. “Nonpreferred codons” include, UGG (Trp), UGC (Cys), UCA (Ser), UCG (Ser)CAA, (O) GAA (Glu) and CAG (Gln).

Beyond sequence specific constraints within the coding sequence of thepolynucleotide of interest, additional design criteria for engineering apolynucleotide sequence with altered susceptibility to SHM can includethe following factors:

The choice of promoter; a strong promoter will generally induce a higherrate of transcription resulting a higher overall rate of mutagenesiscompared to a weaker promoter. Further, an inducible promoter, such asthe tet-promoter enables expression, and hence SHM, to be induciblycontrolled, to switch on, or off, transcription and mutagenesis of thepolynucleotide of interest. Gossen and Bujard, Tight control of geneexpression in mammalian cells by tetracycline-responsive promoters. ProcNatl Acad Sci USA. 1992 Jun. 15; 89(12):5547-51; Gossen et al.,Transcriptional activation by tetracyclines in mammalian cells. Science.1995 Jun. 23; 268(5218):1766-9.

The location of the coding sequence relative to the transcriptionalstart point; generally for high level mutagenesis, the polynucleotide ofinterest should be located between about 50 nucleotides, and 2 kb of thetranscriptional start site.

One convenient approach to optimizing a polynucleotide sequence to SHM,involves analyzing the corresponding amino acid sequence of interest viaa computer algorithm that compares and scores (according to theparameters above) possible alternative polynucleotides sequences thatcan be used, via alternative codon usage to encode for the amino acidsequence of interest. By iteratively replacing codons, or groups ofcodons (tiles or SHM motifs) with progressively preferred sequences itis possible to computationally evolve a polynucleotide sequence withdesired properties. Specifically, for example, a sequence that issusceptible to SHM, or that is resistant to SHM, and yet also exhibitsreasonable translational efficiency, stability, minimizes restrictionsites and avoids rare codons in the particular organism of interest.

Using this approach, a library of files can be generated that is basedon the starting amino acid or polynucleotide sequence. In one nonlimiting example of the analysis and optimization strategy, the librarycan be created based on the analysis of groups of 9 nucleotides,corresponding to 3 codons (a “tile” or a “SHM motif”). Each tile can bescored for the attributes described above, to create an initial librarydata set of tiles, containing hundreds of thousands of 9-merpermutations, and their respective scores.

A representative sample of a section of the library file is shown inTable 5 which shows the potential diversity in nucleotide sequencesarising from alternative codon usage for just the three amino acids,Serine (S), Arginine (R) and Leucine (L). A person of skill in the artreadily appreciates that a complete set of files can be readilyassembled for all possible amino acid combinations using known codonusage patterns. Sequence identifiers are next to each sequence inparenthesis.

TABLE 5 Representative polynucleotide diversity encoding a three aminoacid sequence (Ser Arg Leu) 3-mer AA Potential nucleotides HotspotsColdspots CpG MaxNt Log(πp(AA)) SRL AGTCGACTT (43) 0 2 1 1 −5 SRLAGTCGACTG (44) 0 2 1 1 −3 SRL AGTCGATTA (45) 0 1 1 2 −5 SRL AGTCGACTA(46) 0 2 1 1 −5 SRL AGTCGACTC (47) 0 3 1 1 −4 SRL AGTCGATTG (48) 0 1 1 2−5 SRL AGTAGGCTT (49) 2 0 0 2 −4 SRL AGTAGGCTG (50) 2 0 0 2 −2 SRLAGTAGGTTA (51) 2 0 0 2 −4 SRL AGTAGGCTA (52) 2 0 0 2 −4 SRL AGTAGGCTC(53) 2 1 0 2 −3 SRL AGTAGGTTG (54) 2 0 0 2 −4 SRL AGTCGTCTT (55) 0 2 1 1−5 SRL AGTCGTCTG (56) 0 2 1 1 −3 SRL AGTCGTTTA (57) 0 1 1 3 −5 SRLAGTCGTCTA (58) 0 2 1 1 −5 SRL AGTCGTCTC (59) 0 3 1 1 −4 SRL AGTCGTTTG(60) 0 1 1 3 −5 SRL AGTAGACTT (61) 1 1 0 1 −4 SRL AGTAGACTG (62) 1 1 0 1−2 SRL AGTAGATTA (63) 1 0 0 2 −4 SRL AGTAGACTA (64) 1 1 0 1 −4 SRLAGTAGACTC (65) 1 2 0 1 −3 SRL AGTAGATTG (66) 1 0 0 2 −4 SRL AGTCGGCTT(67) 1 1 1 2 −4 SRL AGTCGGCTG (68) 1 1 1 2 −2 SRL AGTCGGTTA (69) 1 1 1 2−4 SRL AGTCGGCTA (70) 1 1 1 2 −4 SRL AGTCGGCTC (71) 1 2 1 2 −3 SRLAGTCGGTTG (72) 1 1 1 2 −4 SRL AGTCGCCTT (73) 0 2 1 2 −4 SRL AGTCGCCTG(74) 0 2 1 2 −2 SRL AGTCGCTTA (75) 0 1 1 2 −4 SRL AGTCGCCTA (76) 0 2 1 2−4 SRL AGTCGCCTC (77) 0 3 1 2 −3 SRL AGTCGCTTG (78) 0 1 1 2 −4 SRLTCACGACTT (79) 0 1 1 1 −5 SRL TCACGACTG (80) 0 1 1 1 −3 SRL TCACGATTA(81) 0 0 1 2 −5 SRL TCACGACTA (82) 0 1 1 1 −5 SRL TCACGACTC (83) 0 2 1 1−4 SRL TCACGATTG (84) 0 0 1 2 −5 SRL TCAAGGCTT (85) 1 0 0 2 −4 SRLTCAAGGCTG (86) 1 0 0 2 −2 SRL TCAAGGTTA (87) 1 0 0 2 −4 SRL TCAAGGCTA(88) 1 0 0 2 −4 SRL TCAAGGCTC (89) 1 1 0 2 −3 SRL TCAAGGTTG (90) 1 0 0 2−4 SRL TCACGTCTT (91) 0 1 1 1 −5 SRL TCACGTCTG (92) 0 1 1 1 −3 SRLTCACGTTTA (93) 0 0 1 3 −5 SRL TCACGTCTA (94) 0 1 1 1 −5 SRL TCACGTCTC(95) 0 2 1 1 −4 SRL TCACGTTTG (96) 0 0 1 3 −5 SRL TCAAGACTT (97) 0 1 0 2−4 SRL TCAAGACTG (98) 0 1 0 2 −2 SRL TCAAGATTA (99) 0 0 0 2 −4 SRLTCAAGACTA (100) 0 1 0 2 −4 SRL TCAAGACTC (101) 0 2 0 2 −3 SRL TCAAGATTG(102) 0 0 0 2 −4 SRL TCACGGCTT (103) 1 0 1 2 −4 SRL TCACGGCTG (104) 1 01 2 −2 SRL TCACGGTTA (105) 1 0 1 2 −4 SRL TCACGGCTA (106) 1 0 1 2 −4 SRLTCACGGCTC (107) 1 1 1 2 −3 SRL TCACGGTTG (108) 1 0 1 2 −4 SRL TCACGCCTT(109) 0 1 1 2 −4 SRL TCACGCCTG (110) 0 1 1 2 −2 SRL TCACGCTTA (111) 0 01 2 −4 SRL TCACGCCTA (112) 0 1 1 2 −4 SRL TCACGCCTC (113) 0 2 1 2 −3 SRLTCACGCTTG (114) 0 0 1 2 −4 SRL AGCCGACTT (115) 1 2 1 2 −5 SRL AGCCGACTG(116) 1 2 1 2 −3 SRL AGCCGATTA (117) 1 1 1 2 −5 SRL AGCCGACTA (118) 1 21 2 −5 SRL AGCCGACTC (119) 1 3 1 2 −4 SRL AGCCGATTG (120) 1 1 1 2 −5 SRLAGCAGGCTT (121) 2 0 0 2 −4 SRL AGCAGGCTG (122) 2 0 0 2 −2 SRL AGCAGGTTA(123) 2 0 0 2 −4 SRL AGCAGGCTA (124) 2 0 0 2 −4 SRL AGCAGGCTC (125) 2 10 2 −3 SRL AGCAGGTTG (126) 2 0 0 2 −4 SRL AGCCGTCTT (127) 1 2 1 2 −5 SRLAGCCGTCTG (128) 1 2 1 2 −3 SRL AGCCGTTTA (129) 1 1 1 3 −5 SRL AGCCGTCTA(130) 1 2 1 2 −5 SRL AGCCGTCTC (131) 1 3 1 2 −4 SRL AGCCGTTTG (132) 1 11 3 −5 SRL AGCAGACTT (133) 1 1 0 1 −4 SRL AGCAGACTG (134) 1 1 0 1 −2 SRLAGCAGATTA (135) 1 0 0 2 −4 SRL AGCAGACTA (136) 1 1 0 1 −4 SRL AGCAGACTC(137) 1 2 0 1 −3 SRL AGCAGATTG (138) 1 0 0 2 −4 SRL AGCCGGCTT (139) 2 11 2 −4 SRL AGCCGGCTG (140) 2 1 1 2 −2 SRL AGCCGGTTA (141) 2 1 1 2 −4 SRLAGCCGGCTA (142) 2 1 1 2 −4 SRL AGCCGGCTC (143) 2 2 1 2 −3 SRL AGCCGGTTG(144) 2 1 1 2 −4 SRL AGCCGCCTT (145) 1 2 1 2 −4 SRL AGCCGCCTG (146) 1 21 2 −2 SRL AGCCGCTTA (147) 1 1 1 2 −4 SRL AGCCGCCTA (148) 1 2 1 2 −4 SRLAGCCGCCTC (149) 1 3 1 2 −3 SRL AGCCGCTTG (150) 1 1 1 2 −4 SRL TCGCGACTT(151) 0 1 2 1 −6 SRL TCGCGACTG (152) 0 1 2 1 −4 SRL TCGCGATTA (153) 0 02 2 −6 SRL TCGCGACTA (154) 0 1 2 1 −6 SRL TCGCGACTC (155) 0 2 2 1 −5 SRLTCGCGATTG (156) 0 0 2 2 −6 SRL TCGAGGCTT (157) 1 1 1 2 −5 SRL TCGAGGCTG(158) 1 1 1 2 −3 SRL TCGAGGTTA (159) 1 1 1 2 −5 SRL TCGAGGCTA (160) 1 11 2 −5 SRL TCGAGGCTC (161) 1 2 1 2 −4 SRL TCGAGGTTG (162) 1 1 1 2 −5 SRLTCGCGTCTT (163) 0 1 2 1 −6 SRL TCGCGTCTG (164) 0 1 2 1 −4 SRL TCGCGTTTA(165) 0 0 2 3 −6 SRL TCGCGTCTA (166) 0 1 2 1 −6 SRL TCGCGTCTC (167) 0 22 1 −5 SRL TCGCGTTTG (168) 0 0 2 3 −6 SRL TCGAGACTT (169) 0 2 1 1 −5 SRLTCGAGACTG (170) 0 2 1 1 −3 SRL TCGAGATTA (171) 0 1 1 2 −5 SRL TCGAGACTA(172) 0 2 1 1 −5 SRL TCGAGACTC (173) 0 3 1 1 −4 SRL TCGAGATTG (174) 0 11 2 −5 SRL TCGCGGCTT (175) 1 0 2 2 −5 SRL TCGCGGCTG (176) 1 0 2 2 −3 SRLTCGCGGTTA (177) 1 0 2 2 −5 SRL TCGCGGCTA (178) 1 0 2 2 −5 SRL TCGCGGCTC(179) 1 1 2 2 −4 SRL TCGCGGTTG (180) 1 0 2 2 −5 SRL TCGCGCCTT (181) 0 12 2 −5 SRL TCGCGCCTG (182) 0 1 2 2 −3 SRL TCGCGCTTA (183) 0 0 2 2 −5 SRLTCGCGCCTA (184) 0 1 2 2 −5 SRL TCGCGCCTC (185) 0 2 2 2 −4 SRL TCGCGCTTG(186) 0 0 2 2 −5 SRL TCCCGACTT (187) 0 2 1 3 −5 SRL TCCCGACTG (188) 0 21 3 −3 SRL TCCCGATTA (189) 0 1 1 3 −5 SRL TCCCGACTA (190) 0 2 1 3 −5 SRLTCCCGACTC (191) 0 3 1 3 −4 SRL TCCCGATTG (192) 0 1 1 3 −5 SRL TCCAGGCTT(193) 1 0 0 2 −4 SRL TCCAGGCTG (194) 1 0 0 2 −2 SRL TCCAGGTTA (195) 1 00 2 −4 SRL TCCAGGCTA (196) 1 0 0 2 −4 SRL TCCAGGCTC (197) 1 1 0 2 −3 SRLTCCAGGTTG (198) 1 0 0 2 −4 SRL TCCCGTCTT (199) 0 2 1 3 −5 SRL TCCCGTCTG(200) 0 2 1 3 −3 SRL TCCCGTTTA (201) 0 1 1 3 −5 SRL TCCCGTCTA (202) 0 21 3 −5 SRL TCCCGTCTC (203) 0 3 1 3 −4 SRL TCCCGTTTG (204) 0 1 1 3 −5 SRLTCCAGACTT (205) 0 1 0 2 −4 SRL TCCAGACTG (206) 0 1 0 2 −2 SRL TCCAGATTA(207) 0 0 0 2 −4 SRL TCCAGACTA (208) 0 1 0 2 −4 SRL TCCAGACTC (209) 0 20 2 −3 SRL TCCAGATTG (210) 0 0 0 2 −4 SRL TCCCGGCTT (211) 1 1 1 3 −4 SRLTCCCGGCTG (212) 1 1 1 3 −2 SRL TCCCGGTTA (213) 1 1 1 3 −4 SRL TCCCGGCTA(214) 1 1 1 3 −4 SRL TCCCGGCTC (215) 1 2 1 3 −3 SRL TCCCGGTTG (216) 1 11 3 −4 SRL TCCCGCCTT (217) 0 2 1 3 −4 SRL TCCCGCCTG (218) 0 2 1 3 −2 SRLTCCCGCTTA (219) 0 1 1 3 −4 SRL TCCCGCCTA (220) 0 2 1 3 −4 SRL TCCCGCCTC(221) 0 3 1 3 −3 SRL TCCCGCTTG (222) 0 1 1 3 −4 SRL TCTCGACTT (223) 0 21 1 −5 SRL TCTCGACTG (224) 0 2 1 1 −3 SRL TCTCGATTA (225) 0 1 1 2 −5 SRLTCTCGACTA (226) 0 2 1 1 −5 SRL TCTCGACTC (227) 0 3 1 1 −4 SRL TCTCGATTG(228) 0 1 1 2 −5 SRL TCTAGGCTT (229) 1 0 0 2 −4 SRL TCTAGGCTG (230) 1 00 2 −2 SRL TCTAGGTTA (231) 1 0 0 2 −4 SRL TCTAGGCTA (232) 1 0 0 2 −4 SRLTCTAGGCTC (233) 1 1 0 2 −3 SRL TCTAGGTTG (234) 1 0 0 2 −4 SRL TCTCGTCTT(235) 0 2 1 1 −5 SRL TCTCGTCTG (236) 0 2 1 1 −3 SRL TCTCGTTTA (237) 0 11 3 −5 SRL TCTCGTCTA (238) 0 2 1 1 −5 SRL TCTCGTCTC (239) 0 3 1 1 −4 SRLTCTCGTTTG (240) 0 1 1 3 −5 SRL TCTAGACTT (241) 0 1 0 1 −4 SRL TCTAGACTG(242) 0 1 0 1 −2 SRL TCTAGATTA (243) 0 0 0 2 −4 SRL TCTAGACTA (244) 0 10 1 −4 SRL TCTAGACTC (245) 0 2 0 1 −3 SRL TCTAGATTG (246) 0 0 0 2 −4 SRLTCTCGGCTT (247) 1 1 1 2 −4 SRL TCTCGGCTG (248) 1 1 1 2 −2 SRL TCTCGGTTA(249) 1 1 1 2 −4 SRL TCTCGGCTA (250) 1 1 1 2 −4 SRL TCTCGGCTC (251) 1 21 2 −3 SRL TCTCGGTTG (252) 1 1 1 2 −4 SRL TCTCGCCTT (253) 0 2 1 2 −4 SRLTCTCGCCTG (254) 0 2 1 2 −2 SRL TCTCGCTTA (255) 0 1 1 2 −4 SRL TCTCGCCTA(256) 0 2 1 2 −4 SRL TCTCGCCTC (257) 0 3 1 2 −3 SRL TCTCGCTTG (258) 0 11 2 −4

Each polynucleotide sequence is ranked based on the followingattributes; number of SHM hot and cold motifs, number of CpG motifs,MaxNt (maximum number of nucleotides in a single stretch) and codonusage frequency of the host cell to be used. The term “Log(πp(AA))”contained in the final column of Table 5 was calculated as the log ofthe product of the individual probabilities of observing each of theamino acids in the trimer, given by the formula:

Log(πp(AA)=ln(p(codon_(i−1)|amino acid_(i−1))*p(codon_(i)|aminoacid_(i))*p(codon_(i+1)|amino acid_(i+1)).

Individual probabilities for each amino acid were based on publishedcodon usage patterns in the organism of interest, in this case, formammalian cells. (See generally Nakamura et al., Nucleic Acid Res.(2000) 28 (1): 292 Codon usage tabulated from international DNA sequencedatabases: status for the year 2000).

As can be readily seen from above, codon usage diversity alone enablespolynucleotide sequences to be created that vary widely in theirsusceptibility to somatic hypermutation, as measured by the number(density) of hot or cold spot motifs present within the sequence.

This analysis readily identifies potential combinations of codons ormotifs that are optimized for SHM and minimize CpGs and use optimalcodons for efficient translation. For example, the sequences listedbelow in Table 6 represent top ranking hot sequences because theycomprise the maximum number of hot spots and no cold spots. Sequenceidentifiers are next to each sequence in parenthesis.

TABLE 6 Top Hot Spot Sequences SRL AGTAGGCTT (259) 2 0 0 2 −4 SRLAGTAGGCTG (260) 2 0 0 2 −2 SRL AGTAGGTTA (261) 2 0 0 2 −4 SRL AGTAGGCTA(262) 2 0 0 2 −4 SRL AGCAGGCTT (263) 2 0 0 2 −4 SRL AGCAGGCTG (264) 2 00 2 −2 SRL AGCAGGTTA (265) 2 0 0 2 −4 SRL AGCAGGCTA (266) 2 0 0 2 −4 SRLAGCAGGTTG (267) 2 0 0 2 −4 SRL AGTAGGTTG (268) 2 0 0 2 −4

Of these, the sequences AGTAGGCTG and AGCAGGCTG are preferred becausethey encompass codons with a higher frequency of use in mammalian cells.

Having defined and scored all possible 9-mer nucleotide tiles, it ispossible to scan through a starting amino acid or nucleotide template,identifying positions in the gene/protein that can be improved bysubstitution from the tile library. This process can be convenientlycompleted using a computer algorithm, such as the peri programSHMredesign.pl; the code of which is shown below:

    #! /usr/bin/perl ############ # #  A program to redesign protein andnucleic acid sequences to be hot or cold to SHM # ############################################################################################################## #     #   Read in the genetic code, amino acids,and mammalian codon usage frequencies # #  $cod_aa{ } -> mapping ofcodon to amino acid #  $cod_anti{ } -> mapping of codon to its oppositestrand sequence #  $codnum{ } -> frequency per 1000 of observing thecodon in mammals #  $tot_cod{ } -> frequency per 1000 of observing thatcodon in mammals, given the identity of the amino acid #  $aa_cod{ }{ }-> maps an amino acid to its codons with the frequency found inmammalian genes #################################################################################################     open(GENE,.”/geneticcode.txt”); while (<GENE>) {  if (/{circumflex over( )}(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\t(\d+)\t(\d+)/) {    $one=$1; $four=$4;  $five=$5; $six=$6; $thr=$3;    $cod_aa{$one}=$thr;   $cod_anti{$one}=$four;    $codnum{$one}=$five;   $tot_cod{$one}=int(1000*$five/$six);    if(!defined($i{$cod_aa{$one}})) { $i{$cod_aa{$one}}=1 }    for($j=$i{$cod_aa{$one}}; $j<=$i{$cod_aa{$one}}+$tot_cod{$one}; $j++) {      $aa_cod{$thr}{$j}=$one;    }    $i{$cod_aa{$one}}=$j;  } }close(GENE);################################################################################################ #     # Read in motifs that are hot or cold to SHM,for assessing output only # #  $hot{ } -> hash containing a list of4-mer hot spots #  $cold{ } -> hash containing a list of 3-mer coldspots #################################################################################################     open(SHM,.”/hotncold.txt”); while (<SHM>) {  if(/{circumflex over ( )}(\S+)\s+(\S+)/) {   $one=$1; $two=$2;   if ($oneeq ‘COLD’) {     $cold{$two}++;   }   if ($one eq ‘HOT’) {    $hot{$two}++;   }  } } close(SHM);################################################################################################ #     #  Read in a library of all 9-mer nucleotidemotifs that have been #  scored for several properties, including # hotspots, # cold spots, # CpG motifs, #  the length of the longestuninterupted stretch of the same nucleotide, and a codon usage score ##  $hotsc{ }{ } -> hash mapping the tiles 3-mer aa and 9-mer na to thenumber of SHM hot spots it contains #  $coldsc{ }{ } -> hash mapping thetiles 3-mer aa and 9-mer na to the number of SHM cold spots it contains#  $cgsc{ }{ } -> hash mapping the tiles 3-mer aa and 9-mer na to thenumber of CpG motifs it contains #  $longsc{ }{ } -> hash mapping thetiles 3-mer aa and 9-mer na to the length of its longest stretch of thesame na #  $codindexsc{ }{ } -> hash mapping the tiles 3-mer aa and9-mer na to its aggregate codon usage score #################################################################################################     open(LIB,“gunzip −c ./3mer_library.txt.gz |”);while (<LIB>) {   if (/{circumflex over( )}(\S+)\t(\S+)\t(\d+)\t(\d+)\t(\d+)\t(\d+)\t(\S+)/) {   $hotsc{$1}{$2}=$3;    $coldsc{$1}{$2}=$4;    $cgsc{$1}{$2}=$5;   $longsc{$1}{$2}=$6;    $codindexsc{$1}{$2}=$7;   } } close(LIB);############################################################################################# #     # Program begins by reading in a fasta-like filecontaining a amino acid or nucleic acid sequence # and a second linethat contains design instructions for each position in the construct# ‘+’ make this position hot to SHM # ‘−’ make this position cold to SHM# ‘.’ this position is neutral to SHM # # Usage:  ./SHMdesign.plinputfile.fasta A/N #  where either A or N is given to indicate an aminoacid or nucleic acid sequence # # $seq -> captures the sequence vector# $change -> captures the design change vector ##############################################################################################     open (FILE,“$ARGV[0]”); while (<FILE>) {   if(/\<(\S+)/) {    $change=$1;   }   if (/\>(\S+)/) {    $seq=uc($1);   }} close(FILE);############################################################################################# #     # if an amino acid sequence is indicated, astarting nucleic acid sequence is generated that # is consistent withcodon usage, and loaded into the arrays listed below. Else, if a nucleic# acid sequence was given as a starting reference the sequence is takendirectly from the # input file and loaded into arrays # # $aa_vector[ ]-> array containing amino acid identities of the sequence # $ch_vector[] -> array containing amino acid identifies of the design changes# $nuc_vector[ ] -> array containing codons for each position # $length-> variable holding the length of the construct in amino acids/codons ##############################################################################################     if ($ARGV[1] eq ‘A’) {  @aa_array=split(//,$seq);   foreach $aa (@aa_array) {    chomp $aa;   $count++; $aa_vector[$count]=$aa;  }  @ch_array=split(//,$change); foreach $ch (@ch_array) {    chomp $ch;   $count2++; $ch_vector[$count2]=$ch;  }  if ($count != $count2) {print“COUNT Mismatch\n”}  for ($length=1; $length<=$count; $length++) {   $r=int(rand(1000)+1);   $nuc_vector[$length]=$aa_cod{$aa_vector[$length]}{$r};  } } elsif($ARGV[1] eq ‘N’) {  $count=0;  @nuc_array=split(//,$seq);    foreach$nuc (@nuc_array) {    chomp $nuc;    $length = int($count/3)+1;$nuc_vector[$length] .= $nuc; $count++;  }  $count2=0; @ch_array=split(//,$change);    foreach $ch (@ch_array) {    chomp $ch;   $length = int($count2/3)+1; $ch_vector[$length] = $ch; $count2++;  } if ($count != $count2)  {print “COUNT Mismatch\n”}  $templength =int($count/3);  for ($length=1; $length<=$templength; $length++) {   $aa_vector[$length]=$cod_aa{$nuc_vector[$length]};  } } else { print“\n\n input format:\n ./SHMdesign.pl inputfile.fasta A/N \n\n”; exit; }############################################################################     # The program begins the process of construct optimization with 20rounds # of 100 attempted tile substitutions at random positionsthroughout the construct. # At the beginning of each round for ($j=1;$j<=20; $j++) { ############ Print starting state for the round##########################     print “ITERATION\t$j\n”;undef $nuclear;$length2=0; ### Amino acid sequence of construct   for ($i=1;$i<=$length; $i++) {     print “$cod_aa{$nuc_vector[$i]} ”;   } print“\n”; ### Nucleic acid sequence of the construct   for ($i=1;$i<=$length; $i++) {    print “$nuc_vector[$i]”;   @temp=split(//,$nuc_vector[$i]); foreach $n (@temp) { $length2++;   $nuclear[$length2]=$n }    } print “\n”; ### SHM Design vector forthe construct   for ($i=1; $i<=$length; $i++) {    print “$ch_vector[$i]”;   } print “\n”;   for ($i=1; $i<=$length2; $i++) { ### SHM hot spotsfor the construct  $temp=“$nuclear[$i].”“$nuclear[$i+1].”“$nuclear[$i+2].”“$nuclear[$i+3]”;   if (defined($hot{$temp})) {print “+”} else {print “ ”}   } print“\n”; ### SHM cold spots for the construct   for ($i=1; $i<=$length2;$i++) {    $temp=“$nuclear[$i].”“$nuclear[$i+1].”“$nuclear[$i+2]”;    if(defined($cold{$temp})) { print “−” } else {print “ ”}   } print “\n”;### CpG motifs within the construct   for ($i=1; $i<=$length2; $i++) {   $temp=“$nuclear[$i].”“$nuclear[$i+1]”;    if ($temp eq ‘CG’) { print“C” } else {print “ ”}   } print “\n”; ############# End printingsection ########################################     ### Substitute 1003mer amino acid positions ########################### # # At a randomlychosen position in the construct, a 9-mer nucleic acid in- frame sectionis chosen # all other nucleotide sequences consistent with the aminoacid sequence are evaluated, # depending on whether this position isdesignated a hot, cold or neutral, and the sequence that results in thebest # design improvement is chosen and subsititued. After a 100interations, the programs evaluates its current state # and prints tothe screen # # $position -> randomly chosen position within theconstruct # $nucleicacid -> current 9-mer nucleic acid at the positionchosen for evaluation # $aminoacid -> current 3-mer amino acid at theposition chosen for evaluation # $better -> flag for best sequencesubstitution at the position, if one is selected # $cur_coldsc,$cur_hotsc, $cur_cgsc, $cur_codindexsc -> place holders for the scores#   of the currently selected 9-mer/3-mer at the position beingevaluated ##########################################################################    for ($k=1; $k<=100; $k++) {   $position=int(rand($length−4))+2;  $pos1=$position−1; $pos2=$position; $pos3=$position+1;  $nucleicacid=“$nuc_vector[$pos1]$nuc_vector[$pos2]$nuc_vector[$pos3]”;$aminoacid=“$cod_aa{$nuc_vector[$pos1]}$cod_aa{$nuc_vector[$pos2]}$cod_aa-{$nuc_vector[$pos3]}”;   $cur_hotsc=$hotsc{$aminoacid}{$nucleicacid};  $cur_coldsc=$coldsc{$aminoacid}{$nucleicacid};  $cur_cgsc=$cgsc{$aminoacid}{$nucleicacid};  $cur_longsc=$longsc{$aminoacid}{$nucleicacid};  $cur_codindexsc=$codindexsc{$aminoacid}{$nucleicacid}; #  print“$k\t$position\t$length\t$aminoacid\t$nucleicacid\t#     $cur_hotsc\t$cur_coldsc\t$cur_cgsc\t$cur_longsc\t#     $cur_codindexsc\n”;   undef $better;   if ($ch_vector[$pos2] eq‘−’) {    foreach $spot3 (keys %{$hotsc{$aminoacid}}) {     if(($cur_coldsc < $coldsc{$aminoacid}{$spot3}) &&      ($cur_hotsc >=$hotsc{$aminoacid}{$spot3}) &&      ($cur_cgsc >=$cgsc{$aminoacid}{$spot3}) &&      ($cur_codindexsc <=$codindexsc{$aminoacid}{$spot}) &&      ($longsc{$aminoacid}{$spot}<=4)) {        $better=$spot3;        $cur_coldsc =$coldsc{$aminoacid}{$spot3};        $cur_hotsc =$hotsc{$aminoacid}{$spot3};        $cur_cgsc =$cgsc{$aminoacid}{$spot3};        $cur_codindexsc =$codindexsc{$aminoacid}{$spot};     }    }   }   if ($ch_vector[$pos2]eq ‘+’) {    foreach $spot3 (keys %{$hotsc{$aminoacid}}) {     if(($cur_coldsc >= $coldsc{$aminoacid}{$spot3}) &&      ($cur_hotsc <$hotsc{$aminoacid}{$spot3}) &&      ($cur_cgsc >=$cgsc{$aminoacid}{$spot3}) &&      ($cur_codindexsc <=$codindexsc{$aminoacid}{$spot}) &&      ($longsc{$aminoacid}{$spot}<=3)) {        $better=$spot3;        $cur_coldsc =$coldsc{$aminoacid}{$spot3};        $cur_hotsc =$hotsc{$aminoacid}{$spot3};        $cur_cgsc =$cgsc{$aminoacid}{$spot3};        $cur_codindexsc =$codindexsc{$aminoacid}{$spot};     }    }   }   if ($ch_vector[$pos2]eq ‘.’) {    foreach $spot3 (keys %{$hotsc{$aminoacid}}) {     if(($cur_cgsc >= $cgsc{$aminoacid}{$spot3}) &&      ($cur_codindexsc <=$codindexsc{$aminoacid}{$spot}) &&      ($longsc{$aminoacid}{$spot}<=3)) {        $better=$spot3;        $cur_coldsc =$coldsc{$aminoacid}{$spot3};        $cur_hotsc =$hotsc{$aminoacid}{$spot3};        $cur_cgsc =$cgsc{$aminoacid}{$spot3};        $cur_codindexsc =$codindexsc{$aminoacid}{$spot};     }    }   }########################################################################################################### #     #  if the variable $better isdefined after exhaustively searching for an improved nucleic acidsequence # at the position, substitute that sequence into the evolving$nuc_vector sequence, then proceed with the next trial # # else, go tothe next of the 100 random trails and try again ########################################################################################################        if (defined($better)) {   @array=split(//,$better); $tempcount=0; $tempvector[1]=‘’;$tempvector[2]=‘’; $tempvector[3]=‘’;     foreach $nuc (@array) {     chomp $nuc;      $new_position = int($tempcount/3)+1;     $tempvector[$new_position] .= $nuc; $tempcount++;     }     #######  print“$nuc_vector[$pos1].$nuc_vector[$pos2].$nuc_vector[$pos3]\t$tempvector[1].-$tempvector[2].$tempvector[3]\n”;     $nuc_vector[$pos1]=$tempvector[1];    $nuc_vector[$pos2]=$tempvector[2];    $nuc_vector[$pos3]=$tempvector[3];   }  } } exit;

In addition to the file of potential 3 amino acid tiles shown above, theprogram also calls upon a file of hot spots and cold spot motifs asoutlined below in Table 7, and a listing of the genetic code totranslate amino acid sequences to polynucleotide sequences:

TABLE 7 Canonical Hot and Cold Spot Motifs Coldspots Hotspots CCC TACCGGTA CTC TACA TGTA GCC TACT AGTA GTC TGCC GGCA GGG TGCA TGCA GAG TGCTAGCA GGC AACC GGTT GAC AACA TGTT AACT AGTT AGCC GGCT AGCA TGCT AGCT AGCT

One can recognize that there are many potential approaches, andcomputational methods which can be used to find the best codon usage tomaximize hot spot or cold spot density, and that the invention is notintended to be limited to any one specific method of determining theoptimum sequence.

When a starting amino acid template is given (for instance when theunderlying DNA sequence may not be known), the algorithm begins by firstgenerating a DNA nucleotide sequence that is consistent with both thegiven amino acid sequence and known codon usage in that organism. Thestarting nucleotide template contains an additional line that instructsthe perl program SHMredesign.pl as to whether HOT or COLD sites shouldbe incorporated at a given position, making it possible to silence orminimize SHM in portions of evolving proteins, while simultaneouslydirecting SHM to areas for targeting, for instance, the CDRs of anantibody molecule. A given 9-mer SHM motif in the polynucleotide can becompared with all other possible nonameric oligonucleotides that wouldencode the same three amino acids at that position.

If a sequence, or portion thereof, is susceptible to SHM (made “hot”),an exhaustive search of all nucleotide sequences consistent with theamino acid sequence is made, and the nucleotide sequence of the evolvingconstruct is replaced by a new nucleotide sequence if the followingconditions are met: (1) the new 9-mer SHM motif contains more hot spotmotifs that the existing sequence, (2) the new 9-mer contains a numberof cold spotmotifs equal to or less than the evolving sequence, (3) thenew 9-mer contains a number of CpG sequence motifs equal to or less thanthe evolving sequence, (4) the evolving sequence has a codon usage scorethat equals or improves known aggregate codon usage at the position, and(5) the sequence does not contain a stretch of any one nucleotidegreater than 4 residues.

If a sequence, or portion thereof, is being made resistant to SHM (beingmade “cold”), an exhaustive search of all nucleotide sequencesconsistent with the amino acid sequence is made, and the nucleotidesequence of the evolving construct is replaced by a new nucleotidesequence if the following conditions are met: (1) the new 9-mer SHMmotif contains more cold spot motifs that the existing sequence, (2) thenew 9-mer contains a number of hot spot motifs equal to or less than theevolving sequence, (3) the new 9-mer contains a number of CpG sequencemotifs equal to or less than the evolving sequence, (4) the evolvingsequence has a codon usage score that equals or improves known aggregatecodon usage at the position, and (5) the new 9-mer nucleotide sequencedoes not contain a stretch of any one nucleotide greater than 4residues.

If a sequence is being optimized for other factors other than SHM (beingmade “neutral”), an exhaustive search of all nucleotide sequencesconsistent with the amino acid sequence is made, and the nucleotidesequence of the evolving construct is replaced by the new nucleotidesequence if the following conditions are met: (1) the new 9-mer containsa number of CpG sequence motifs equal to or less than the evolvingsequence, (2) the evolving sequence has a codon usage score that equalsor improves known aggregate codon usage at the position, and (3) the new9-mer nucleotide sequence does not contain a stretch of any onenucleotide greater than 4 residues.

Starting from any given polynucleotide sequence, this approach can beused to generate polynucleotide sequences that rapidly converge to asmall number of possible sequences that are optimized for the propertiesdescribed herein (Example 1).

Following computational analysis, a final optimized polynucleotide canbe synthesized using standard methodology and sequenced to confirmcorrect synthesis. Once the sequence of the polynucleotide has beenconfirmed, the polynucleotide can be inserted into a vector. The vectorcan be introduced into a host cell as described herein and tested forexpression, activity, or increased and/or decreased susceptibility toSHM.

V. Construction of Synthetic Libraries for SHM Mediated Diversification

Synthetic polynucleotide libraries can be used for the directedevolution and selection of proteins with novel phenotypes by exploitingthe diversity generating and targeting properties of SHM.

In the case of antibodies, this means targeted diversification ofcomplementarity determining regions (CDRs) that can bind new or alteredepitopes. Simplified CDR libraries containing four and even 2 amino acidalphabets (serine and tyrosine) have also been described and were foundto be capable of binding antigens with high affinity and selectivity.See, e.g., Fellouse F A, Li B, Compaan D M, Peden A A, Hymowitz S G,Sidhu SS Molecular recognition by a binary code. J Mol. Biol. (2005)348:1153-62; and Fellouse F A, Wiesmann C, Sidhu SS Synthetic antibodiesfrom a four-amino-acid code: a dominant role for tyrosine in antigenrecognition. Proc Natl Acad Sci USA. (2004) 101:12467-72.

Synthetic polynucleotide libraries can also be used the case ofnon-antibody polypeptides such as enzymes and other protein classes,this refers to targeting diversification to regions of the enzyme orprotein of interest which regulate the biological activity of saidenzyme or protein, such as binding specificity, enzymatic function,fluorescence, or other properties. Libraries are usually combined withone or more selection strategies as disclosed below, which provide forthe identification and/or separation of the improved, or functionalmembers of the library from the non-functional members of the library.

Static libraries are, in some embodiments, limited in their Size andscope. Phage display libraries, for example can display as many as 10¹²members, and ribosomal libraries have been constructed that potentiallycontain ˜10¹⁶ members. Libraries presented on the surface of bacterialand mammalian cells are not usually this complex, with fewer than about10⁹ members. In addition, robust library construction and selectionusually requires that libraries contain several fold redundancy, whichfurther limits this theoretical complexity, and makes screening theentire library slow, expensive, and in some cases in-practical.

Despite these levels of complexity, such static libraries can exploreonly a small fraction of possible sequence space. In one non-limitingexample, a heavy chain IgG sequence can contain more than 30 amino acidswithin the CDR1, CDR2, and CDR3 complementarity regions, giving thissingle chain more than 20³⁰ possible permutations, dwarfing even thelargest of potential libraries. Because of this limitation, researchershave explored methodologies for evolving protein sequences andlibraries. SHM, as addressed in the present application, uses AID anderror-prone polymerases as the mechanism for evolving antibody sequencesundergoing affinity maturation. A system that would facilitateSHM-mediated mutagenesis and selection at each position of interestwithin a polynucleotide library of a given gene would permit theselective exploration of functional sequence space. Such a searchstrategy enables a much larger sequence diversity to be explored, makingthis method very attractive for the rapid development of newfunctionalities and therapeutics. For instance, a library composed ofonly a small number of hot spot codons at each coding position of astretch of 10 amino acids (2¹⁰ permuations=1.6*10⁴), where each positionis capable of evolving under SHM to a diverse panel of resulting aminoacids, represents a vast simplication of the complexity/diversity neededto encode an equivalent static library of all 20 amino acids at each ofthe ten library positions (20¹⁰ permutations=1.6*10¹⁴).

In one aspect, the present invention includes a synthetic dynamiclibrary that is capable of rapid evolution through SHM-mediatedmutagenesis. Such a synthetic library has the following properties: i)The library is easy to synthesize and is based around a limited numberof discrete functional sequences; ii) The library contains syntheticpolynucleotide sequences that comprises one or more synthetic variableregions that are targeted for selective mutagenesis and includes a highdensity of SHM hot spots; iii) The library contains syntheticpolynucleotide sequences that comprise one or more synthetic frameworkregions that are resistant to SHM mediated mutagenesis and include a lowdensity of SHM hot spots; iv) The library does not contain, or containsa minimum number of, certain codons (“non preferred codons”) that can bemutated to stop codons in one step through SHM, including, UGG (Trp),UGC (Cys), UCA (Ser), UCG(Ser), CAA(Gln), GAA (Glu) and CAG (Gln); andv). From the starting set of codons, AID-mediated mutagenesis produces alarge potential diversity at each position for further evolution andselection of function (“preferred SHM hot spot codons” or “preferred SHMhot spot motifs”).

A method for SHM optimization of gene sequences as described herein usesa previously scored library of 9-mer tiles (SHM motifs) that can besubstituted to modify a sequence in order to generate a ‘hotter’ or‘colder’ sequence as described above, while still maintaining the aminoacid sequence and other desirable properties. It should be noted thatthe tiles, while chosen to be 9 nucleotides in this application, can beof any length, as long as they permit the accurate assessment ofsequence based properties. Likewise, the hot and cold SHM motifs can beused to score the 9-mer tile libraries for substitution, it is possibleto use any other sized or defined SHM hot or cold spot motifs incombination with this approach; for example based on hot and cold scoresderived from statistical analyses, to make improved constructs. Tilescan be scored based on their use of in-frame SHM hot spot motifs as away of seeding not just mutagenesis but the resulting amino aciddiversity (or lack of same) at various positions throughout theconstruct. Tiles can be scored based on their use of in-frame SHM coldspot motifs as a way of stabilizing a polynucleotide againstAID-mediated mutagenesis at various positions throughout the construct.

One can recognize that there are many potential approaches, andcomputational methods which could be used to find the best codon usageto maximize hot spot or cold spot motif density, and that the inventionis not intended to be limited to any one specific method of determiningthe optimum sequence.

VI. Library Design

A synthetic library around a specific protein of interest can bedesigned in light of any known pre-existing information regardingstructure activity relationships, homology between different species,and x-ray or NMR structural information of the protein or protein familyin question, if available.

In one aspect, initial library design can involve the following steps:

1. The amino acid sequence of the protein of interest is identified, andthe corresponding polynucleotide sequence determined or reversetranscribed.

2. Any relevant structural information on the protein of interest, andrelated proteins, or on homologous proteins of interest is obtained.

3. A sequence comparison is preformed on the protein of interestcompared to all other proteins from closely related species, and knownisoforms, and in certain embodiments, a sequence alignment can becreated to identify conserved, and variable amino acid sequences betweenspecies.

This information can be used to establish whether a specific amino acidor protein domain is likely to be important in a functional, orstructural, attribute of the protein of interest, and whether it isconserved or variant across functional isoforms of the protein acrossprotein families.

Based on this information, it is possible to establish particularregions of interest that appear to be directly involved in functional orstructural attributes of the protein of interest. For example, theseamino acids can lie within, or within about 5 Å of a specific functionalor structural attribute of interest. Specific examples include, but arenot limited to, amino acids within CDRs of antibodies, binding pocketsof receptors, catalytic clefts of enzymes, protein-protein interactiondomains, of co-factors, allosteric binding sites etc.

Based on the structural and sequence analysis as set forth above, one ormore polynucleotides can be designed to increase or decreaseSHM-mediated mutagenesis using the parameters described above.Furthermore, the design can incorporate one or more of the followingconcepts:

i) Highly conserved amino acids, or amino acids known, or believed todirectly contribute key binding energy can be initially conserved, andthe codon usage within their immediate vicinity changed to either createa cold spot motif, or altered to promote mostly conservative amino acidchanges during SHM as described herein.

ii) Amino acid domains that appear to be involved in maintaining thecore structural framework of the protein can be initially conserved, andtheir codon usage changed to promote mostly conservative amino acidchanges during SHM. Amino acid residues in particularly important framework regions can be altered to use a higher percentage of cold spots,and utilize codons or motifs that are resistant to SHM, or result insilent mutations during SUM.

iii) Amino acids in regions of interest can be varied to incorporatesynthetic variable regions enabling high efficiency SHM, as describedbelow.

iv) Amino acids that are not identified as playing clearly identifiedroles with respect to a function or structure of a polypeptide can bemodified to enable effective SHM, i.e., the frequency of SHM hot spotscan be maximized and the frequency of SHM cold spots can be minimized.

VII. The Design of Synthetic Variable Regions

The rank ordering of susceptibility to mutagenicity of all SHM hot spotsfor AID and error prone polymerases was presented in the Section III. Wefurther identified a reading frame context that is critical forgeneration of silent vs. non-silent mutations. Herein we describe asynthetic library approach that includes the use of a high-density ofpreferred SHM hot spot codons or motifs that lead to generation ofdiverse amino acids at each library position which is desired to bemutated. Such high density hot spot motifs are particularly important atthe boundary of synthetic variable regions to ensure efficientmutagenesis.

A. WAC Based Motifs

Polynucleotide sequences comprising only the sequence WAC (WAC, whereW=A or T is encoded in equal proportions, and where the reading frame ofreference places C at the wobble or 3^(rd) position of each codon)provides for a high density of hot spots.

This simple pattern would produce only 4 potential 6-mer nucleotidepatterns containing only two codons (AAC and TAC) encoding the 2 aminoacids, Asparagine (Asn) and Tyrosine (Tyr). Sequence identifiers arenext to each sequence in parenthesis.

TABLE 8 Codons Amino acids AACTAC (269) Asn Tyr AACAAC (270) Asn AsnTACTAC (271) Tyr Tyr TACAAC (272) Tyr Asn

All of the motifs encoded by the WAC library, given in any of the threepossible reading frames, produce a concentration of hot spots. FIG. 3compares these motifs with all other possible 4096 6-mer nucleotidecombinations for their ability to recruit SHM-mediated machinery. Longerassemblies result in the same high density of SHM “hot spots” with no“cold spots.” It is also worth noting that this assembly of degeneratecodons (WACW) results in a subset of possible 4-mer hot spots describedby Rogozin et al. (WRCH), where R=A or G, H=A or C or T, and W=T or A.

As seen in FIG. 4, the preferred SHM hot spot codons AAC and TAC, whichare the basis for this synthetic library, can result in a set of primaryand secondary mutation events that create considerable amino aciddiversity, as judged by equivalent SHM mutation events observed in Igheavy chains antibodies. From these two codons, basic amino acids(histidine, lysine, arginine), an acidic amino acid (aspartate),hydrophilic amino acids (serine, threonine, asparagine, tyrosine),hydrophobic amino acids (alanine, and phenylalanine), and glycine aregenerated as a result of SHM events.

B. WRC Based Motifs

A second potential synthetic high density SHM motif, termed here the WRCmotif, is one that contains two possible codons: AGC and TAC encodingthe 2 amino acids, Serine (Ser) and Tyrosine (Tyr). In this embodiment,the four possible 6-mer nucleotides include:

TABLE 9 Codons Amino acids AGCTAC (273) Ser Tyr AGCAGC (274) Ser SerTACAGC (275) Tyr Ser TACTAC (271) Tyr TyrSequence identifiers are next to each sequence in parentheses.

The distribution of all 4096 6-mer nucleotide z-scores describing thehotness or coldness of the motif to SHM-mediated mutation is illustratedin FIG. 5. The z-scores for all permutations of 6-mers in the WRCsynthetic library are superimposed on this distribution, with the dashedline denoting the top 5% of all possible motifs.

The series of mutation events that lead to the creation of amino aciddiversity, starting from “preferred SHM hot spot codons” AGC and TAC, asobserved in affinity matured IGV heavy chain sequences is illustrated inFIG. 6. 4200 primary and secondary mutation events, starting from codonsencoding asparagine and tyrosine, lead to a set of functionally diverseamino acids.

Again this motif results in an unusually high density of optimal SHM hotspots and hot codons, as visualized in FIG. 5, when compared with allother 6-mer nucleotide motifs. Like the WAC synthetic motif, the WRCsynthetic motif presents preferred SHM hot spot codons that, whencombined with the activity of SHM, AID and one or more error-pronepolymerases, generates a broad spectrum of potential amino aciddiversity at each position (FIG. 6).

Thus in one aspect, such synthetic variable regions can be targeted tospecific regions of interest within a polynucleotide sequence thatencode specific domains, or sub domains of interest and for which a highdegree of diversity is desired.

In another aspect WAC or WRC motifs can be inserted systematicallythroughout the open reading frame of the protein of interest. Forexample, for a 100 amino acid residue protein, 300 discretepolynucleotides could be generated in which a WAC or WRC motif wasseparately introduced once into every possible position within theprotein. Each of these 100 polynucleotides could then be screened,either separately, or after being pooled into a library, to identifyoptimal amino acid substitutions at each position. The improvedmutations at each position could then be re-combined to create a nextgeneration construct comprising all of best individual amino acidsidentified at each position.

C. Region Mutagenesis

To provide for effective mutagenesis within larger regions, codons ormotifs can be modified as discussed previously to increase the densityof hot spots throughout one or more regions of interest. This approachhas the advantage of needing no preconceived idea of where SHM should betargeted, or what specific amino acids are essential for activity.

For regions in which efficient SHM is desired, a synthetic variableregion can be created by both changing codon usage and by makingconservative amino acid substitutions so as to insert codons or motifsthat have an improved hot spot density. Suitable amino acidsubstitutions can be selected from those listed below in Table 10, whileobserving the same overall criteria for stable gene creation, and domainstructure.

TABLE 10 Preferred SHM Codons Use in Codons Amino Acid Group/Sub groupplace of: AGC/AGU Ser Aliphatic/Slightly Thr/Cys non polar GGU GlyAliphatic/Small Ala residue GCU/GCA Ala Aliphatic/Small Gly residueCUA/UUG/CUU Leu Aliphatic/Large Val/Met Charged AAA/AAG LysCharged/Positive Arg CAU His Charged/Positive Arg/Phe GAU AspCharged/Negative Glu GAG Glu Charged/Negative Asp Charged/Polar CAG GlnCharged/Polar Asn AAU/AAC Asn Charged/Polar Gln Aromatic/Phenyl UAU/UACTyr Aromatic/Phenyl Trp UUU/UUA/UUC Phe Aromatic/Phenyl Trp/Phe

In some embodiments, the amino acids Trp, Pro and Gly are conservedwhere their location suggests a functional or structural role. Otherthan these amino acids, if an amino acid to be optimized is not listed,an amino acid from the same sub-group as listed below is selected.

Such synthetic variable regions can be interspersed with frameworkregions containing primarily SHM resistant sequences, which can bedesigned as described previously.

Depending on the amount of information available, a number of distinctlibrary design strategies can be employed, ranging from a veryaggressive targeted approach based on the use of WAC or WRC motifs, to amore conservative strategy of using fairly selective amino acidreplacements, to a cautious strategy in which only codon usage ischanged. An advantage of the present invention is that each approachresults in the generation of only a limited number of distinctnucleotide sequences; thus all of these strategies can be subjected toSHM mediated diversity in parallel without significant additionalburden.

VIII. Methods for Monitoring SHM Activity

Art-recognized methods for monitoring SHM in antibody and non-antibodyproteins using various vectors and cell lines are known (Rückerl et al.(Mol. Immunol. 43 (2006); 1645-1652), Bacl et al. (J. Immunol., 2001,166: 5051-5057), Cumbers et al. (Nat. Biotechnol. 2002; 20(11):1129-1134), Wang, et al. (Proc Natl Acad Sci USA. 2004;101(19):7352-7356)). In addition, various methods for directly measuringcytidine deamination are known in the art; see, e.g., Genetic and Invitro assays of DNA deamination, Coker et al., Meth. Enzymol. 408:156-170 (2006).

Such methods provide rapid means for evaluating the rate of on-goingSHM. The methods include, for example, the use of reporter genes, orselectable marker genes that have been modified to include a stop codonwithin the coding frame which can be mutated in the presence of AIDactivity.

As AID acts on a population, it can produce mutations that restore orimprove function (of a selectable marker, for instance), or mutationsthat reduce or eliminate function. The balance in these two ratesgenerates early and rare mutation events that restore function, followedby secondary and ternary mutation events that destroy function in theseproteins. The net effect of these competing rates on the observation ofgain-of-function events in a population. Given three differentassumptions regarding number of inactivating mutations needed to silenceGFP, one would expect to observe three very different profiles ofreversion events as a function of time, dependent on the rate ofenzymatic activity of the AID.

Additionally, polynucleotides subjected to SHM activity can be sequencedto determine if the nucleotide sequence of has been modified and to whatdegree. Polynucleotides can be rescued from culture to determine SHM atvarious time points. Methods of isolating and sequencing genes are wellknown in the art, and include, the use of standard techniques such as,for example, Reverse Transcription-Polymerase Chain Reaction (RT-PCR).Briefly, the polynucleotide can be reverse transcribed and subjected toPCR using appropriate primers. Clones can be sequenced using automatedDNA sequences from companies such as Applied Biosystems (ABI-377 or ABI3730 DNA sequencers). Sequences can be analyzed for frequency ofnucleotide modifications. The polynucleotide can be compared with thepolynucleotide from the starting material and analyzed for sequencemodifications.

IX. Somatic Hypermutation (SHM) Systems

A. Synthetic Polynucleotide Sequences

The development of a practical system for the use of SHM requires thatmutations be directed to specific genes (polynucleotides) or regions ofinterest (made “hot”), and be directed away from structural or markergenes that are functionally required within the cell or episome, tomaintain overall system functionality and/or stability (made “cold”). Incertain embodiments, a synthetic gene is one that does naturally undergoSHM when expressed in a B cell (i.e., an antibody gene). In otherembodiments, a synthetic gene is one that does not naturally undergo SHMwhen expressed in a B cell (i.e., a non-antibody gene).

The present invention is based on the development of a system to designand make or generate SHM susceptible and SHM resistant polynucleotidesequences within a cell or cell-free, environment. The present inventionis further based on the development of a SHM system that is stable overa suitable time period to reproducibly maintain increased and/ordecreased rates of SHM without affecting structural portions orpolypeptides or structural proteins, transcriptional control regions andselectable markers. The system allows for stable maintenance of amutagenesis system that provides for high level targeted SHM in apolynucleotide of interest, while sufficiently preventing non-specificmutagenesis of structural proteins, transcriptional control regions andselectable markers.

In part, the present system is based around the creation of a morestable version of cytidine deaminase that can provide for high levelsustained SHM. High level over-expression of wild type AID in mammaliancells does not necessarily lead to a stable increase in SHM activitybecause the enzyme itself can either accumulate inactivating mutationsthrough SHM of its own DNA sequence, or be silenced throughpost-translational modifications (Ronai et al., PNAS USA 2005; 102(33):11829-34; Iglesias-Ussel et al., J. Immunol. Methods. 2006, 316: 59-66).The present SHM system, therefore, includes a synthetic AID gene (SEQ IDNO: 22) that is resistant to SHM (cold) and exhibits a reduced rate ofself mutation.

Thus, in one aspect, the present invention includes a synthetic AID genethat has been altered at the polynucleotide level to have a reducednumber of hot spots, and/or increased number of cold spots. In oneembodiment of this synthetic gene, the gene also has a modified contentof CpG methylation sites. In another aspect, the synthetic gene has beenoptimized for SHM codon usage specific to the organism in which thesynthetic gene is to be expressed or mutated.

Additionally, there are a variety of other component nucleotidesequences, such as coding sequences and genetic elements that can makeup the core system that one would, in some embodiments, prefer not tohypermutate to maintain overall system integrity. These componentnucleotide sequences include without limitation, i) selectable markerssuch as neomycin, blasticidin, ampicillin, etc; ii) reporter genes (e.g.fluorescent proteins, epitope tags, reporter enzymes); iii) geneticregulatory signals, e.g. promoters, inducible systems, enhancersequences, IRES sequences, transcription or translational terminators,kozak sequences, splice sites, origin of replication, repressors; iv)enzymes or accessory factors used for high level enhanced SHM, or it'sregulation, or measurement, such as AID, pol eta, transcription factors,and MSH2; v) signal transduction components (kinases, receptors,transcription factors) and vi) domains or sub domains of proteins suchas nuclear localization signals, transmembrane domains, catalyticdomains, protein-protein interaction domains, and other protein familyconserved motifs, domains and sub-domains.

Thus, in another aspect of the present invention, the SHM systemdescribed herein can include any synthetic gene that has been altered,at the polynucleotide level, either in whole, or part, to have a reducednumber of hot spots, and/or an increased number of cold spots using themethods described herein. In one embodiment, the synthetic gene also hasa modified content of CpG methylation sites.

Provided herein is an expression vector, comprising at least onesynthetic gene. In one aspect, the expression vector is an integratingexpression vector. When the expression vector is an integratingexpression vector, the expression vector can further comprise one ormore sequences to direction recombination. In another aspect, theexpression vector is an episomal expression vector. In yet anotheraspect, the expression vector is a viral expression vector.

In another aspect, the synthetic gene has been optimized for SHM codonusage specific to the organism in which the synthetic gene is to beexpressed and/or mutated.

Provided herein is a SHM resistant synthetic gene encoding a protein, ora portion thereof, wherein one or more first SHM motifs in an unmodifiedpolynucleotide sequence encoding said protein or portion thereof hasbeen replaced by one or more second SHM motifs having a lowerprobability of SHM, wherein said SHM resistant synthetic gene exhibits alower rate of AID-mediated mutagenesis compared to said unmodifiedpolynucleotide sequence.

The present invention also contemplates that a SHM resistant syntheticgene can be created in a step-wise or sequential fashion such that somemodifications are made to the gene and then a subsequent round ofmodification is made to the gene. Such sequential or step-wisemodifications are contemplated by the present invention and are one wayof carrying out the process and one way of producing the genes claimedherein.

In one embodiment, the SHM resistant synthetic gene encodes a protein orportion thereof having about 95%, about 90% amino, about 85%, about 80%,about 75%, about 70%, about 65%, about 60%, about 55%, about 50%, or anypercentage between about 50% and about 100% identity to the unmodifiedgene.

In one embodiment, the SHM resistant synthetic gene exhibits a lowerrate of AID-mediated mutagenesis including, but not limited to,1.05-fold, 1.1-fold, 1.2-fold, 1.5-fold, 2-fold, 5-fold, 10-fold,50-fold, 100-fold, 200-fold, 500-fold, 1000-fold or less, or any rangetherebetween.

In one embodiment, the SHM resistant synthetic gene exhibiting a rate ofAID-mediated mutagenesis at a level which is less than about 99%, lessthan about 95%, less than about 90%, less than about 85%, less thanabout 80%, less than about 75%, less than about 70%, less than about65%, less than about 60%, less than about 55%, or less than about 50%,of that exhibited by an unmodified gene.

In other embodiments, a high rate of SHM mediated mutagenesis targetedto a polynucleotide of interest is desirable to direct rapid directedevolution of the polynucleotide.

Provided herein is a SHM susceptible synthetic gene encoding a protein,or a portion thereof, wherein one or more first SHM motifs in anunmodified polynucleotide sequence encoding said protein or portionthereof has been replaced by one or more second SHM motifs having ahigher probability of SHM, wherein said SHM susceptible synthetic geneexhibits a higher rate of AID-mediated mutagenesis compared to saidunmodified polynucleotide sequence.

The present invention also contemplates that a SHM susceptible syntheticgene can be created in a step-wise or sequential fashion such that somemodifications are made to the gene and then a subsequent round ofmodification is made to the gene. Such sequential or step-wisemodifications are contemplated by the present invention and are one wayof carrying out the process and one way of producing the genes claimedherein.

In one embodiment, the SHM susceptible synthetic gene encodes a proteinor portion thereof having about 95%, about 90% amino, about 85%, about80%, about 75%, about 70%, about 65%, about 60%, about 55%, about 50%,or any percentage between about 50% and about 100% identity to theunmodified gene.

In one embodiment, the SHM susceptible synthetic gene exhibits a higherrate of AID-mediated mutagenesis including, but not limited to,1.05-fold, 1.1-fold, 1.25-fold, 1.5-fold, 2-fold, 5-fold, 10-fold,50-fold, 100-fold, 200-fold, 500-fold, 1000-fold or more, or any rangetherebetween.

In one embodiment, the SHM susceptible synthetic gene exhibits a rate ofactivation induced cytidine deaminase (AID)-mediated mutagenesis at alevel which is at least about 101%, at least about 105%, at least about110%, at least about 115%, at least about 120%, at least about 125%, atleast about 130%, at least about 135%, at least about 1140%, at leastabout 145%, at least about 150%, at least about 200%, at least about250%, at least about 300%, at least about 350%, at least about 400%, atleast about 450%, at least about 5000%, or higher of that exhibited byan unmodified gene.

Provided herein is a selectively targeted, SHM optimized synthetic geneencoding a protein, or a portion thereof, wherein one or more first SHMmotifs in an unmodified polynucleotide sequence encoding said protein orportion thereof has been replaced by one or more second SHM motifshaving a higher probability of SHM; and one or more third SHM motifs insaid unmodified polynucleotide sequence encoding said protein or portionthereof has been replaced by one or more fourth SHM motifs having alower probability of SHM; wherein said selectively targeted, SHMoptimized synthetic gene exhibits targeted AID-mediated mutagenesis. Insuch an embodiment, the selectively targeted, SHM optimized syntheticgene has some portions that exhibit a higher rate of AID-mediatedmutagenesis and some portions that exhibit a lower rate of AID-mediatedmutagenesis.

The present invention also contemplates that a selectively targeted SHMoptimized synthetic gene can be created in a step-wise or sequentialfashion such that some modifications are made to the gene and then asubsequent round of modification is made to the gene. Such sequentialor, step-wise modifications are contemplated by the present inventionand are one way of carrying out the process and one way of producing thegenes claimed herein.

Further provided herein is a system that enables high level somatichypermutation that can be targeted to specific polynucleotides (e.g.,synthetic genes or areas within synthetic genes) of interest, whileavoiding the non-specific mutagenesis of components involved in themaintenance of the mutagenesis and expression system. Polynucleotideswhich can be mutated in the present system include any polynucleotidesequence that can be expressed and modified through AID-mediatedmutagenesis. Such polynucleotides include those that encode polypeptidesincluding, for example, specific binding members, enzymes, receptors,neurotransmitters, hormones, cytokines, chemokines, structural proteins,co-factors, toxins, or any other polypeptide or protein of interest orportion thereof. In one aspect such polynucleotides encodeimmunoglobulin polypeptides (antibodies) such as variable heavy chainsor light chains or portions thereof.

Provided herein is a SHM system, e.g. one or more vectors (e.g.,expression vectors) for SHM comprising at least one component (e.g.polynucleotide, gene, nucleic acid sequence, coding sequence, geneticelement, a portion thereof, etc.) that includes, but is not limited to,a polynucleotide that has been altered from wild type to eitherpositively or negatively influence the rate of SHM experienced by thatcomponent, or portion thereof.

In one aspect of the SHM system, at least one component of theexpression system comprises a polynucleotide that has been altered, inwhole, or part, to negatively influence the rate of SHM experienced bythat component. In one aspect of this system, the component is apolynucleotide encoding a protein such as, but not limited to, an AID,or an AID homolog, Pol eta, a selectable marker, a fluorescent protein,EBNA1, or a represser (or transactivator) protein.

In one aspect of the SHM system, at least one component of theexpression system comprises a polynucleotide that has been altered, inwhole, or part, from wild-type to positively influence the rate of SHMexperienced by that component; or comprises a polynucleotide that has ahigh rate of SHM, such as, for example, a hypervariable region of anantibody gene, the DNA binding domain of a transcription factor, theactive site of an enzyme, the binding domain of a receptor, or a subdomain or domain of interest.

In another aspect, the SHM system comprises; i) at least onepolynucleotide that can be a polynucleotide that has been altered inwhole or part, from wild type to positively influence the rate of SHMexperienced by that polynucleotide, or a polynucleotide that has anaturally high frequency of hot spots, ii) and the expression systemalso includes a polynucleotide that has been altered in whole or part,from wild-type to negatively influence the rate of SHM.

In one embodiment, provided herein is a SHM susceptible gene encoding aprotein or a portion thereof wherein one or more first SHM motifs in anunmodified polynucleotide sequence encoding said protein or portionthereof has been replaced by one or more second SHM motifs having ahigher probability of SHM, said synthetic gene having a greater densityof hot spot motifs than said unmodified polynucleotide sequence.

In another embodiment, provided herein is a SHM resistant synthetic geneencoding a protein or a portion thereof wherein one or more first SHMmotifs in an unmodified polynucleotide sequence encoding said protein orportion thereof has been replaced by one or more second SHM motifshaving a lower probability of SHM, said synthetic gene having a greaterdensity of cold spots than said unmodified polynucleotide sequence.

In yet another embodiment, provided herein is a selectively targeted,SHM optimized synthetic gene encoding a protein or portion thereof,wherein one or more first SHM motifs in an unmodified polynucleotidesequence encoding said protein or portion thereof has been replaced byone or more second SHM motifs having a higher probability of SHM, saidsynthetic gene having a greater density of hot spot motifs than saidunmodified polynucleotide sequence; and one or more third SHM motifs insaid unmodified polynucleotide sequence encoding said protein or portionthereof has been replaced by one or more fourth SHM motifs having alower probability of SHM, said synthetic gene having a greater densityof cold spots than said unmodified polynucleotide sequence; wherein saidselectively targeted, SHM optimized synthetic gene exhibits targetedAID-mediated mutagenesis.

In yet another non-limiting aspect, the said synthetic gene includes oneor more amino acid mutations that introduce preferred SHM hot spotmotifs.

Thus, one aspect the present invention includes a method of creating SHMresistant (“cold”) or SHM susceptible (“hot”) genes wherein the hot spotor cold spot density is altered from the unmodified density, whilemaintaining translational efficiency of the synthetic protein (i.e. theability to be successfully expressed in a mammalian cell).

In one aspect, a synthetic SHM resistant gene of the present inventionhas an average hot spot density of 7, or less, hot spots per 100nucleotides. In another aspect, said synthetic SHM resistant gene has anaverage cold spot density of greater than 16 cold spots per 100nucleotides.

In a preferred aspect, a synthetic SHM resistant gene of the presentinvention has an average hot spot density of less than 6 hot spots per100 nucleotides. In another aspect, said synthetic SHM resistant genehas an average cold spot density of greater than 18 cold spots per 100nucleotides. In another aspect, said synthetic SHM resistant gene has anaverage cold spot density of greater than 20 cold spots per 100nucleotides. In another aspect, said synthetic SHM resistant gene has anaverage cold spot density of greater than 22 cold spots per 100nucleotides.

In one aspect, a synthetic SHM susceptible gene of the present inventionhas an average cold spot density of 13 or less cold spots per 100nucleotides. In another aspect, said synthetic SHM susceptible gene hasan average hot spot density of greater than 10 hot spots per 100nucleotides.

In a preferred aspect, a synthetic SHM susceptible gene of the presentinvention has an average cold spot density of less than 11 cold spotsper 100 nucleotides. In another aspect, said synthetic SHM susceptiblegene has an average hot spot density of greater than 13 hot spots per100 nucleotides. In another aspect, said synthetic SHM susceptible genehas an average hot spot density of greater than 14 hot spots per 100nucleotides. In another aspect, said synthetic SHM susceptible gene hasan average hot spot density of greater than 16 hot spots per 100nucleotides

In another aspect, a synthetic SHM susceptible polynucleotide has anaverage hot spot density of greater than 20 hot spots per 100nucleotides over at least 12 contiguous nucleotides. In another aspect,a synthetic SHM susceptible polynucleotide has an average hot spotdensity of greater than 20 hot spots per 100 nucleotides over at least15 contiguous nucleotides. In another aspect, a synthetic SHMsusceptible polynucleotide has an average hot spot density of greaterthan 20 hot spots per 100 nucleotides over at least 18 contiguousnucleotides. In another aspect, a synthetic SHM susceptiblepolynucleotide has an average hot spot density of greater than 20 hotspots per 100 nucleotides over at least 30 contiguous nucleotides.

In one non-limiting example, a synthetic gene does not include genescomprising a stop motif inserted into an open reading frame.

In each of the SHM systems described herein, the systems can furtherinclude one or more of the following additional elements: i) aninducible system to regulate the expression of AID, or an AID homolog,ii) one or more enhancers, iii) one or more E-boxes, iv) one or moreauxiliary factors for SHM, or v) one or more factors for stable episomalexpression, such as EBNA1, or EBP2.

In another aspect of the SHM systems, the system includes twopolynucleotides in which both polynucleotides are located in proximityto a promoter. In one aspect of the system, the promoter can be abi-directional promoter; such as a bi-directional CMV promoter.

In another aspect, the polynucleotide can encode antibody chains. Forexample, heavy or light chains can be inserted into a vector forexpression. In other aspects the polynucleotide can encode any protein,i.e. a wild-type polypeptide, a non-wild-type polypeptide, a syntheticpolypeptide, a recombinant polypeptide or any portion thereof such as,without limitation, antibody heavy chains or fragments thereof, antibodylight chains or fragments thereof, enzymes, receptors, structuralproteins, co-factors, synthetic peptides, intrabodies, or toxins.

Provided herein is a method for preparing a gene product having adesired property, comprising: a) preparing a synthetic gene encoding agene product which exhibits increased somatic hypermutation; b)expressing said synthetic gene in a population of cells; wherein saidpopulation of cells express activation induced cytidine deaminase (AID),or can be induced to express AID via the addition of an inducing agent;and c) selecting a cell or cells within the population of cells whichexpress a mutated gene product having the desired property. In oneaspect, the method, optionally, further comprises activating or inducingthe expressing AID in said population of cells. In another aspect, themethod, optionally, further comprises establishing one or more clonalpopulations of cells from the cell or cells identified in (c). In yetanother aspect of the method, at least one synthetic gene is located inan expression vector such as any one of the vectors described elsewhereherein. In one aspect of the method, the cell is a cell as describedelsewhere herein.

Provided herein is a method for preparing a gene product having adesired property, comprising: a) expressing said gene product in apopulation of cells; wherein said population of cells comprises at leastone synthetic gene which exhibits decreased somatic hypermutation; andwherein said population of cells express an activation induced cytidinedeaminase (AID), or can be induced to express AID via addition of aninducing agent; Provided herein is a method of generatingSHM-susceptible or SHM-resistant polynucleotides by modifying hot and/orcold spots in a polynucleotide encoding a polypeptide. The method ofgenerating a SHM-susceptible or SHM-resistant polypeptide by includesthe following: a) identifying a polypeptide or portion thereof; b)generating a polynucleotide sequence that codes for the identifiedpolypeptide sequence, c) changing the codon usage in the polynucleotidesequence to increase or decrease the frequency, or location with respectto the reading frame of hot spots and/or cold spots within thatpolynucleotide sequence, without substantially changing the amino acidsencoded by the polynucleotide sequence; d) selecting a polynucleotidesequence in which the frequency and b) selecting a cell or cells withinthe population of cells which express a mutated gene product (e.g., apolypeptide encoded by the mutated synthetic gene, the gene having oneor more mutations) having the desired property. In one aspect, themethod, optionally, further comprises activating or inducing theexpressing AID in said population of cells. In another aspect, themethod, optionally, further comprises establishing one or more clonalpopulations of cells from the cell or cells identified in (b). In yetanother aspect of the method, at least one synthetic gene is located inan expression vector such as any one of the vectors described elsewhereherein. In one aspect of the method, the cell is a cell as describedelsewhere herein.

In one aspect, a protein encoded by a synthetic gene is selected fromamong antibodies or antigen-binding fragments thereof, selectablemarkers, fluorescent proteins, cytokines, chemokines, growth factors,hormones, enzymes, receptors, structural proteins, toxins, co-factorsand transcription factors.

Provided herein is a method of generating SHM-susceptible orSHM-resistant polynucleotides by modifying hot and/or cold spots in apolynucleotide encoding a polypeptide. The method of generating aSHM-susceptible or SHM-resistant polypeptide includes the following: a)identifying a polypeptide or portion thereof; b) generating apolynucleotide sequence that codes for the identified substantiallychanging the amino acids encoded by the polynucleotide sequence; d)selecting a polynucleotide sequence in which the frequency of hot and/orcold spots has been altered to the desired degree. In one aspect thefrequency of hot spots can be altered by about 0% to about 25%, inanother aspect, by about 25% to about 50%. In still another aspect byabout 50% to about 75%, and in yet another aspect by about 75% to about100% of all possible hot spots or cold spots.

Provided herein is a method for preparing a gene product having adesired property, comprising: (a) expressing a synthetic gene in apopulation of cells; wherein said population of cells express AID, orcan be induced to express AID via the addition of an inducing agent; and(b) selecting a cell or cells within the population of cells whichexpress a modified gene product having the desired property.

In another aspect of the present invention, the polynucleotide sequencecan also be altered by the substitution of non preferred codons ormotifs for more preferred codons or motifs.

In another aspect of the present invention, the polynucleotide sequencecan also be altered by the substitution or replacement of nucleotides tofurther alter the frequency and or location of hot spots. In one aspect,these nucleotide substitutions can be conservative amino acidreplacements, or be located in variable regions across protein families.

Provided herein is a method of optimizing a sequence for SHM by makingit susceptible or resistant to SHM, comprising the steps of a)identifying a polynucleotide sequence; b) changing the codon usage inthe polynucleotide sequence to alter the frequency and/or location ofhot spots, or cold spots within the polynucleotide sequence and at leasttwo parameters selected from the group of optimization factorsconsisting of, CpG dinucleotide frequency, the predicted formation ofstep-loop structures; restriction site frequency, mammalian codon usage;limiting global GC content to less than about 60%; minimizing oreliminating stretches greater than (>) 6 of same nucleotide; whereinsaid SHM susceptible or SHM resistant sequence exhibits alteredsusceptibility to SHM, and is translated into protein within a cell at alevel that is equivalent to an unmodified polynucleotide sequence.

In one aspect of this method, step b) involves the alteration of atleast three parameters from the group of optimization factors.

In one aspect of this method, step b) involves the alteration of atleast four parameters from the group of optimization factors.

In one aspect of this method, step b) involves the alteration of atleast five parameters from the group of optimization factors.

In one aspect of this method, step b) involves the alteration of atleast six parameters from the group of optimization factors.

In one aspect, step b) is conducted by using a perl programSHMredesign.pl (www.PERL.COM/DOWNLOAD.CSP).

These methods can be used for generating SHM-susceptible orSHM-resistant polynucleotides of any sequence. In either case, thepolynucleotide can include an open reading frame of at least 18nucleotides, and can be operatively linked to regulatory elements toenable the polynucleotide of interest to be efficiently transcribed intoRNA. The polynucleotides can be cloned using standard, art-recognizedtechniques into a vector. In one aspect, the vector is a vectordescribed herein.

Non-limiting examples of polynucleotides to be modified in whole, orpart, using such methods include wild-type polynucleotides, syntheticpolynucleotides, recombinant polynucleotides or any portion thereof. Thepolynucleotide can encode a protein or polypeptide sequence.

Provided herein are SHM resistant polynucleotide sequences encodingvarious genes generated using the methods described herein. In onenon-limiting embodiment, the gene is an enzyme involved in SHM,including without limitation, activation-induced cytidine deaminase(AID), Pol eta, and UDG made by the methods described herein.

In one embodiment, the SHM resistant polynucleotide that encodes anenzyme involved in SHM is derived from a vertebrate. In one aspect thegene is derived from a mammal, in one aspect from a mammal selected fromthe group consisting of rat, dog, human, mouse, cow, and primate.

In one embodiment the SHM resistant enzyme is AID having a nucleic acidsequence that is at least 90% identical to SEQ ID NO: 22.

In one embodiment, the SHM resistant gene is Pol eta having a nucleicacid sequence that is at least 90% identical to SEQ ID NO: 23.

In one embodiment, the SHM resistant gene is UDG having a nucleic acidsequence that is at least 90% identical to SEQ ID NO: 24.

In one embodiment, the cold AID is a mutant form of the enzyme whichexhibits increased mutator activity. Mutant forms of AID can contain astrong nuclear import signal (NLS), a mutation that alters the activityof the nuclear export signal or both.

In one aspect, the mutated AID contains a modified nuclear exportsequence made by one or more mutations independently selected atpositions 180 to 198 of AID (SEQ ID NO: 311), which one or moremutations enhance mutator activity of the modified AID.

In one embodiment of the mutated AID, the modified nuclear exportsequence contains one or more mutations at amino acid residue positionsLeucine 181 (L181), Leucine 183 (L183), Leucine 189 (L189), Leucine 196(L196) and/or Leucine 198 (L198). In one non-limiting example, each ofthe leucine residues in the nuclear export sequence can be mutated to analanine.

In one embodiment, each of Leucines 181, 183, 189, 196 and 198 can beindependently substituted with glycine, alanine, isoleucine, valine,serine, threonine, aspartate or lysine. In another embodiment, eachLeucine can be independently substituted with glycine, alanine orserine. In one embodiment, the mutant protein comprises at least one, atleast two, at least three or at least four mutations selected fromL181A, L183A, L189A, L196A and L198A.

In one embodiment of the mutated AID, the modified nuclear exportsequence comprises a mutation at one or more of amino acid residuepositions aspartate 187 (D187), aspartate 188 (D188), aspartate 191(D191) and/or threonine 195 (T195). In one non-limiting example, themodified nuclear export sequence comprises a D187E mutation. In onenon-limiting example, each of the aspartate residues in the nuclearexport sequence can be mutated to a glutamate. In another non-limitingexample, threonine 195 can be mutated to isoleucine.

In one embodiment, each of aspartates 187, 188, and 191 can beindependently substituted with serine, threonine, glutamate, asparagineor glutamine. In another embodiment, each aspartate can be independentlysubstituted with glutamate or asparagine. In one aspect, the mutantprotein comprises at least one, at least two, at least three or at leastfour mutations selected from D187E, D188E, D191E, T195I and L198A.

Mutated AID polypeptides can also contain a nuclear localization signalwhich can be N-terminal or C-terminal. In one non-limiting example, amutated AID can contain a strong nuclear localization signal such as,but not limited to PKKKRKV (SEQ ID NO: 340). In another non-limitingexample, the NLS can be a sequence conforming to the motif K-K/R-X-K/R.

In another aspect, the mutated AID contains both a strong NLS and amodified nuclear export sequence. In one non-limiting example, themodified nuclear export sequence contains one or more of the followingmutations: L181A, L183A, L189A, L196A and L198A. In another non-limitingexample, the modified nuclear export sequence contains one or more ofthe following mutations: D187E, D188E, D191E, T195I and L198A.

In any of these mutant forms of AID, the gene may be SHM resistant, SHMsusceptible, or can include the appropriate optimal codon usage forexpression of the AID in the host cell of choice, without regard for SHMsusceptibility. When used in an expression system to target SHM to aprotein of interest, the mutant form of AID can be SHM resistant.

In another embodiment, the SHM resistant gene is a selectable markergene. In one-non-limiting embodiment the selectable marker gene isselected from the group consisting of tetracycline, ampicillin,blasticidin, puromycin, hygromycin, kanamycin DHFR neomycin, Zeocin™,thymidine kinase, hypoxanthine-guanine phosphoribosyltransferase,adenine phosphoribosyltransferase and adenosine deaminase.

In one aspect, the SHM resistant gene is the tetracycline resistancegene having a nucleic acid sequence that is at least 90% identical toSEQ ID NO: 25.

In one aspect, the SHM resistant gene is the blasticidin resistance genehaving a nucleic acid sequence that is at least 90-95% identical to SEQID NO: 26.

In one aspect, the SHM resistant gene is the puromycin resistance genehaving a nucleic acid sequence that is at least 90% identical to SEQ IDNO: 27.

In one aspect, the SHM resistant gene is the hygromycin resistance genehaving a nucleic acid sequence that is at least 90% identical to SEQ IDNO: 28.

In one aspect, the SHM resistant gene is the neomycin resistance genehaving a nucleic acid sequence that is at least 90% identical to SEQ IDNO: 29.

In one aspect, the SHM resistant gene is the Zeocin resistance genehaving a nucleic acid sequence that is at least 90% identical to SEQ IDNO: 30.

In one aspect, the SHM resistant gene is thymidine kinase having anucleic acid sequence that is at least 95% identical to SEQ ID NO: 31.

Also included in the present invention are optimized versions ofparticular genes of interest that are specifically susceptible to SHMmediated inactivation, and for which the activation of SHM can be usedto selectively knock out the function of the genes. For example, thegene of interest can include one or more hot spot motifs that introducea stop codon as a result of SHM mediated mutagenesis.

Also provided herein are SHM optimized polynucleotide sequences, orpotions thereof, encoding various proteins of interest that enableimproved versions of the proteins to be rapidly evolved using the SHMsystems of the present invention. For example, the proteins, or portionsthereof, encoded by said polynucleotides includes antibody genes thatcan be iteratively evolved to higher affinity, selectivity, stability orsolubility.

Other exemplary polynucleotide sequences to be optimized by SHM include,but are not limited to, the following proteins toxins, growth factors,neurotransmitters, co-factors, transcription factors (e.g., zinc fingerbinding proteins), receptors (e.g., an Fc receptor) all of which can beevolved using the present invention. Specific, proteins of interest, andtheir evolution to improved forms, are discussed in detail in Section X.

B. Vector Systems

Provided herein are replicons for use in any of the SHM systemsdescribed herein to facilitate the selective modification of targetnucleic acid sequences, genes, or portions thereof, while repressing themutagenesis of structural proteins, resistance markers and other factorsfor SHM.

Such replicons can include at least one synthetic polynucleotidesequence that is resistant to SHM. In one aspect the replicon caninclude at least one unmodified, or synthetic polynucleotide sequence ofinterest that is designed, or contains, nucleotide sequences that areoptimized for SHM. In some embodiments the replicon can includeexpression control sequences to enable the expression of one or morepolynucleotides of interest in the mutator cell line.

Suitable replicons can be based on any known viral, or non-viral vectoror an artificial chromosome. An expression system can include anycombination of different replicons which can be used in sum to create acoordinated system for SHM.

In one aspect, a replicon suitable for the present invention is createdby the insertion, or replacement of at least one polynucleotide sequencewith a synthetic polynucleotide sequence that is resistant to SHM.

In another aspect a replicon suitable for the present invention iscreated by the insertion, or replacement of at least one polynucleotidesequence with a synthetic polynucleotide sequence that is optimized forSHM.

In another aspect a replicon suitable for the present invention iscreated by the insertion, or replacement of at least one polynucleotidesequence with a synthetic polynucleotide sequence that is optimized forSHM, and said replicon also includes at least one polynucleotidesequence with a synthetic polynucleotide sequence that is resistant toSHM. In a preferred embodiment, the replicon is capable of expression ofone or more synthetic polynucleotide sequences within the mutator cellline.

Suitable, vectors can be based on any known episomal vector orintegrating vector, including those described herein, known in the art,or discovered or designed in the future.

Representative commercially available viral expression vectors include,but are not limited to, the adenovirus-based systems, such as the Per.C6system available from Crucell, Inc., lentiviral-based systems such aspLP1 from Invitrogen, and retroviral vectors such as Retroviral VectorspFB-ERV and pCFB-EGSH from Stratagene.

An episomal expression vector is able to replicate in the host cell, andpersists as an extrachromosomal episome within the host cell in thepresence of appropriate selective pressure. (See for example, Conese etal., Gene Therapy 11: 1735-1742 (2004)). Representative commerciallyavailable episomal expression vectors include, but are not limited to,episomal plasmids that utilize Epstein Barr Nuclear Antigen 1 (EBNA1)and the Epstein Barr Virus (EBV) origin of replication (oriP), specificexamples include the vectors pREP4, pCEP4, pREP7 from Invitrogen. Thehost range of EBV based vectors can be increased to virtually anyeukaryotic cell type through the co-expression of EBNA1 binding protein2 (EPB2) (Kapoor et al., EMBO. J. 20: 222-230 (2001)), vectors pcDNA3.1from Invitrogen, and pBK-CMV from Stratagene represent non-limitingexamples of an episomal vector that uses T-antigen and the SV40 originof replication in lieu of EBNA1 and oriP.

An integrating expression vector can randomly integrate into the hostcell's DNA, or can include a recombination site to enable the specificrecombination between the expression vector and the host cellschromosome. Such integrating expression vectors can utilize theendogenous expression control sequences of the host cell's chromosomesto effect expression of the desired protein. Examples of vectors thatintegrate in a site specific manner include, for example, components ofthe flp-in system from Invitrogen (e.g., pcDNA™5/FRT), or the cre-loxsystem, such as can be found in the pExchange-6 Core Vectors fromStratagene. Examples of vectors that integrate into host cellchromosomes in a random fashion include, for example, pcDNA3.1 (whenintroduced in the absence of T-antigen) from Invitrogen, pCI or pFN10A(ACT) Flexi® from Promega.

Alternatively, the expression vector can be used to introduce andintegrate a strong promoter or enhancer sequences into a locus in thecell so as to modulate the expression of an endogenous gene of interest(Capecchi M R. Nat Rev Genet. (2005); 6 (6):507-12; Schindehutte et al.,Stem Cells (2005); 23 (1):10-5). This approach can also be used toinsert an inducible promoter, such as the Tet-On promoter (U.S. Pat.Nos. 5,464,758 and 5,814,618), in to the genomic DNA of the cell so asto provide inducible expression of an endogenous gene of interest. Theactivating construct can also include targeting sequence(s) to enablehomologous or non-homologous recombination of the activating sequenceinto a desired locus specific for the gene of interest (see for example,Garcia-Otin & Guillou, Front Biosci. (2006) 11:1108-36). Alternatively,an inducible recombinase system, such as the Cre-ER system, can be usedto activate a transgene in the presence of 4-hydroxytamoxifen (Indra etal. Nuc. Acid. Res. (1999) 27 (22): 4324-4327; Nuc. Acid. Res. (2000)28(23): e99; and U.S. Pat. No. 7,112,715).

Elements to be included in an expression vector are well known in theart, and any existing vector can be readily modified for use in thepresent invention, for example, through the insertion or replacement ofone or more polynucleotide sequences with synthetic polynucleotidesequences as described above.

An expression vector of the present invention can include one or more ofthe following elements operatively linked together, either on a singlereplicon, or within multiple replicons; expression control sequences,polynucleotides comprising an open reading frame of a gene of interest,transcription termination signals, origin of replication, and selectablemarker genes.

Expression vectors can also include, one or more internal ribosomalentry sites, one or more tags for ease of purification of the proteinencoded by the gene of interest (e.g., VSV tag, HA tag, 6×His tag, FLAGtag), one or more reporter genes, EBNA1 (in cases where episomalreplication is desired for a OriP base vector and or EBP2; T-antigen canalso be used in conjunction with the SV40 on as an alternative to EBNAIplus oriP), one or more moieties for copy analysis such as portions of agene that can be used to verify copy number of the vector relative tothe host cell's chromosomal DNA, (e.g. glucose-6-phosphate dehydrogenase(hG6PDH) (or variants thereof), one or enzymes or factors for SHM, (e.g.AID, pol eta, UDG, enhancer sequences).

Expression vectors can also include anti-sense, ribozymes or siRNApolynucleotides to reduce the expression of target sequences such as,for example, to reduce the level of Pol beta (See, e.g., Sioud M, &Iversen, Curr. Drug Targets (2005) 6 (6):647-53; Sandy et al.,Biotechniques (2005) 39 (2):215-24).

It may be desirable in some instances to convert a surface displayedprotein into a secreted protein for further characterization. Conversioncan be accomplished through the use of a specific linker that can becleaved by incubation with a selective protease such as factor X,thrombin or any other selective proteolytic agent. It is also possibleto include polynucleotide sequences that enable the genetic manipulationof the encoded protein in the vector (i.e., that allow excision of asurface attachment signal from the protein reading frame). For example,the insertion of one or more unique restriction sites, or cre/loxelements, or other recombination elements that enable the selectiveremoval of an attachment signal and subsequent intracellularaccumulation (or secretion) of the protein of interest at will. Furtherexamples include the insertion of flanking loxP sites around anattachment signal (such as a transmembrane domain) allowing forefficient cell surface expression of a protein of interest. However,upon expression of the cre recombinase in the cell, recombination occursbetween the LoxP sites resulting in the loss of the attachment signal,and thus leading to the secretion of the protein of interest.

A plasmid encoding the cre recombinase protein (open reading fromsynthesized by DNA2.0 and inserted into an expression vector) can betransiently transfected (or virally transduced) into a cell populationof interest. Action by the expressed cre recombinase protein leads tothe in situ removal of the transmembrane portion of the coding regionresulting in the translation and production of a secreted form of theprotein in the transfected cell population, which can then be used forfurther studies.

The order and number of elements can be determined by one of ordinaryskill in the art.

Example 5 below provides a brief description of the vectors that couldbe used in any of the SHM systems provided herein.

Briefly, in one aspect, provided herein is an episomal hypermutationcompetent expression vector comprising: one or more origins (e.g., aprokaryotic ori, such as colE1, and one or more eukaryotic origins, suchas oriP, or SV40-ori, or both oriP and SV40 ori), one or more selectablemarkers (e.g., an ampicillin resistance gene), one or more b-actin orG6PDH fragments or variants thereof, one or more promoters (e.g., a pCMVpromoter) at least one of which drives the transcription of a gene orgenes in which hypermutation is desired, one or more restriction sitesfor insertion of a nucleic acid sequence or gene, optionally includingone or more secretion signals, attachment signals, purification tags(e.g., a hemagglutinin (HA) tag) for the gene(s) of interest, aninternal ribosomal entry site (IRES), one or more puromycin genes, andone or more transcriptional termination signals. The transcriptionaltermination signal can include a region of 3′ untranslated region, anoptional intron (also referred to as intervening sequence or IVS) andone or more poly adenylation signals (p(A)). In one non-limitingexample, episomal expression vectors can have the nucleic acid sequencefor a fluorescent protein, inserted into the vector for SHM, to increaseor decrease fluorescence or to alter wavelength of absorption oremission.

In another aspect, provided herein is an integrating expression vectorcomprising a recombination system, and one or more of the followingelements operatively linked together; expression control sequences,polynucleotides comprising an open reading frame of a gene of interest,transcription termination signals, origin of replication, and selectablemarker genes.

Expression vectors can also include, one or more internal ribosomalentry sites, one or more tags for ease of purification of the proteinencoded by the gene of interest (e.g., VSV tag, HA tag, 6×His tag, FLAGtag), one or more reporter genes, one or more moieties for copyanalysis, one or enzymes or factors for SHM, (e.g. AID, pol eta, UDG, Igenhancer sequences)

In one non-limiting embodiment, the expression vector is designed toenable the insertion of a gene of interest into a genomic locus, forexample an Ig gene locus.

C. Systems for Transcription and Hypermutation.

1. In Vitro Expression and Hypermutation Systems

In vitro expression and hypermutation systems include cell free systemsthat enable the transcription, or coupled transcription and translationof DNA templates and on-going mutagenesis via SHM. In one embodiment,such in vitro translation and hypermutation systems can be used incombination with ribosome display to enable the ongoing mutagenesis andselection of proteins.

In vitro translation systems include, for example, the classical rabbitreticulocyte system, as well as novel cell free synthesis systems, (J.Biotechnol. (2004) 110 (3) 257-63; Biotechnol Annu. Rev. (2004) 101-30). Systems for ribosome display are described for example inVillemagne et al., J. Imm. Meth. (2006) 313 (1-2) 140-148).

In one aspect, an in vitro hypermutation system can comprise apolynucleotide, or library of polynucleotides, that include anexpression cassette for the expression of a gene of interest. The geneof interest can be a synthetic or semi-synthetic gene comprising asequence that has been optimized for SHM. For ribosome display, thepolynucleotide can lack a stop codon so that it remained attached to theribosome after translation.

To effect transcription and or translation of the gene of interest thesystem would include purified or semi-purified components for in vitrotranscription and translation, for example via the use of recombinantfactors with purified 70S ribosomes. To enable on-going SHM, the systemwould further include recombinant, or purified AID and/or other factorsfor SHM/DNA repair. Optimized proteins would be selected via functionalselection as described for surface displayed proteins, and then theassociated ribosomes sequenced to determine the identity of favorablemutations.

Provided herein is an in vitro hypermutation system, comprising: a) apolynucleotide comprising a synthetic gene; b) a recombinant AID; and c)an in vitro expression system. In one aspect the synthetic gene has beenoptimized for SHM. The in vitro system can further comprise a polymeraseto amplify nucleic acids after transcription. The in vitro system canfurther comprise an in vitro translation system. In one aspect, thepolynucleotide is located in an expression vector such as any one of thevectors described elsewhere herein. The in vitro system can furthercomprise a cell population of a cell as described elsewhere herein.

Provided herein is a kit for in vitro mutagenesis, comprising: a) arecombinant AID protein; b) one or more reagents for in vitrotranscription; and c) instructions for design or use of a syntheticgene. The kit can further comprise one or more reagents for in vitrotranslation. The kit can further comprise comprising an expressionvector such as, for example, any one of the expression vectors asdescribed herein. The kit can further comprise a cell population of acell as described elsewhere herein.

2. Cell Expression and Hypermutation Systems

Cell based expression and hypermutation systems include any suitableprokaryotic or eukaryotic expression systems. Preferred systems arethose that can be easily and reliably grown, have reasonably fast growthrates, have well characterized expression systems and can be transformedor transfected easily and efficiently.

a. Prokaryotic Expression Systems

Within these general guidelines, useful microbial hosts include, but arenot limited to, bacteria from the genera Bacillus, Escherichia (such asE. coli), Pseudomonas, Streptomyces, Salmonella, Erwinia, Bacillussubtilis, Bacillus brevis, the various strains of Escherichia coli(e.g., HB101, (ATCC NO. 33694) DH5α DH10 and MC1061 (ATCC NO. 53338)).

b. Yeast

Many strains of yeast cells known to those skilled in the art are alsoavailable as host cells for the expression of polypeptides includingthose from the genera Hansenula, Kluyveromyces, Pichia, Rhinosporidium,Saccharomyces, and Schizosaccharomyces, and other fungi. Preferred yeastcells include, for example, Saccharomyces cerivisae and Pichia pastoris.

c. Insect Cells

Additionally, where desired, insect cell systems can be utilized in themethods of the present invention. Such systems are described, forexample, by Kitts et al., Biotechniques, 14:810-817 (1993); Lucklow,Curr. Opin. Biotechnol., 4:564-572 (1993); and Lucklow et al. (J.Virol., 67:4566-4579 (1993). Preferred insect cells include Sf-9 and HI5(Invitrogen, Carlsbad, Calif.).

d. Mammalian Expression Systems

A number of suitable mammalian host cells are also known in the art andmany are available from the American Type Culture Collection (ATCC),10801 University Boulevard, Manassas, Va. 20110-2209. Examples include,but are not limited to, mammalian cells, such as Chinese hamster ovarycells (CHO) (ATCC No. CCL61) CHO DHFR-cells (Urlaub et al., Proc. Natl.Acad. Sci. USA, 97:4216-4220 (1980)), human embryonic kidney (HEK) 293or 293T cells (ATCC No. CRL1573), or 3T3 cells (ATCC No. CCL92). Theselection of suitable mammalian host cells and methods fortransformation, culture, amplification, screening and product productionand purification are known in the art. Other suitable mammalian celllines are the monkey COS-1 (ATCC No. CRL1650) and COS-7 cell lines (ATCCNo. CRL1651), and the CV-1 cell line (ATCC No. CCL70). Further exemplarymammalian host cells include primate cell lines and rodent cell lines,including transformed cell lines. Normal diploid cells, cell strainsderived from in vitro culture of primary tissue, as well as primaryexplants, are also suitable. Candidate cells can be genotypicallydeficient in the selection gene, or can contain a dominantly actingselection gene. Other suitable mammalian cell lines include, but are notlimited to, mouse neuroblastoma N2A cells, HeLa, mouse L-929 cells, 3T3lines derived from Swiss, Balb-c or NIH mice, BHK or HaK hamster celllines, which are available from the ATCC. Each of these cell lines isknown by and available for protein expression.

Also of interest are lymphoid, or lymphoid derived cell lines, such as acell line of pre-B lymphocyte origin. Specific examples include withoutlimitation RAMOS(CRL-1596), Daudi (CCL-213), EB-3 (CCL-85), DT40(CRL-2111), 18-81 (Jack et al., PNAS (1988) δ 1581-1585), Raji cells,(CCL-86) and derivatives thereof.

For use in an SHM system, any of the vectors described herein can beco-transfected into a host cell with a separate vector containing thenucleic acid sequence of AID. In one aspect, the vectors describedherein can be transfected into a host cell that contains endogenous AID.In another aspect, the vectors described herein can be co-transfectedinto a host cell that contains endogenous AID with a separate vectorcontaining the nucleic acid sequence of AID such that AID isover-expressed in the cell. In yet another aspect, the vectors describedherein can be modified to include the sequence of AID for transfectioninto a host cell that does, or does not, contain endogenous AID. In apreferred embodiment the AID is a synthetic AID that comprises apolynucleotide sequence that is SHM resistant.

In one embodiment, the SHM system comprises one or more of the followingselected from among: i) at least one polynucleotide that compriseseither a polynucleotide that has been altered in whole or part, fromwild type polynucleotide to positively influence the rate of SHMexperienced by that polynucleotide, or a polynucleotide that has anaturally high percentage of hot spots prior to any modification; andii) at least one component of the expression system comprises apolynucleotide that has been altered in whole or part, to negativelyinfluence the rate of SHM.

In one aspect, the SHM system comprises one or more polynucleotides thathave been altered from wild-type to negatively influence the rate ofSHM. The polynucleotides can encode, for example, one or more of factorsfor SHM (e.g. AID, Pol eta, UDG), one or more selectable marker genes,or one or more reporter genes.

In another aspect, the SHM system comprises one or more polynucleotidesthat have been altered in whole, or part, from wild-type to positivelyinfluence the rate of SHM. The polynucleotide can be, for example, apolynucleotide of interest encoding an enzyme, receptor, transcriptionfactor, structural protein, toxin, co-factor, specific binding proteinof interest.

In yet another aspect, the SHM system comprises a polynucleotide havingan intrinsically high rate of SHM such as, for example, a polynucleotideof interest encoding an immunoglobulin heavy chain or an immunoglobulinlight chain, or a hypervariable region of an antibody gene.

An SHM system as described herein can further comprise one or more ofthe following additional elements selected from among: i) an induciblesystem to regulate the expression of AID, or AID homolog, one or more Igenhancers, iii) one or more E-boxes, iv) one or more auxiliary factorsfor SHM, v) one or more factors for stable episomal expression, such asEBNA1, EBP2 or ori-P, vi) one or more selectable marker genes, one ormore secondary vectors containing the gene for AID and vii) acombination thereof.

In one aspect, the system includes two polynucleotides of interest inwhich both polynucleotides are located in proximity to a promoter, andexpressed and co-evolved in the same cell simultaneously. In oneembodiment, the promoter is a bi-directional promoter such as abi-directional CMV promoter. In another embodiment, the twopolynucleotides of interest are placed in front of two uni-directionalpromoters. The two promoters can be the same promoter or differentpromoters. The two polynucleotides of interest can be in the same vectoror on different vectors.

Following recombinant introduction of one or more polynucleotides ofinterest into an expression vector, the vector can be amplified,purified, introduced into a host cell using standard transfectiontechniques and characterized using standard molecular biologicaltechniques. Purified plasmid DNA can be introduced into a host cellusing standard transfection/transformation techniques and the resultingtransformants/transfectants grown in appropriate medium containingantibiotics, selectable agents and/or activation/transactivator signals(e.g. inducible agents such as doxycycline) to induce expression of thepolynucleotides of interest. If the host cell endogenously expressesAID, the vector containing the one or more polynucleotides of interestcan be introduced alone into the host cell. Alternatively, if the celldoes not endogenously express AID, then the AID gene, or morepreferably, a synthetic AID gene that is more resistant to SHM thanwild-type can be transfected into the host cell. Thus, AID expressioncan be achieved by either using the same, or a different expressionvector, as described above for the polynucleotide of interest.

Enhancers (e.g., Ig enhancers) can be inserted into a vector to increaseexpression, and/or targeting of SHM to the polynucleotide of interest.

If an inducible system is used, such as the Tet-controlled system,doxycycline can be added to the medium to induce expression of thepolynucleotide of interest, or AID for a period of time (e.g., 1 hour(hr), 2 hrs, 4 hrs, 6 hrs, 8 hrs, 10 hrs, 15 hrs, 20 hrs, 24 hrs or anyother time) prior to analysis by an appropriate assay. The cells can beallowed to grow for a certain time to provide for on-goingdiversification, for example, for 1-3 cell generations, or in certaincases 3-6 generations, or in some cases 6 to 10 generations, or longer.

Cells can be iteratively grown, assayed and selected as described hereinto selectively enrich those cells that express a polynucleotide ofinterest exhibiting a desired property. Suitable assay and enrichmentstrategies (e.g., fluorescent activated cell sorting (FACS); affinityseparation, enzyme activity, toxicity, receptor binding, growthstimulation, etc.) are described below.

Once a population of cells has been obtained that is of interest, thepolynucleotides of interest can be rescued and the correspondingmutations sequenced and identified. For example, total mRNA, orextrachromosal plasmid DNA can be amplified by co-expression of SV40 Tantigen (J. Virol. (1988) 62 (10) 3738-3746) and/or can be extractedfrom cells and used as a template for polymerase chain reaction (PCR) orreverse transcriptase (RT)-PCR to clone the modified polynucleotideusing appropriate primers. Mutant polynucleotides can be sub-cloned intoa vector and expressed in E. coli. A tag (e.g., His-6 tag) can be addedto the carboxy terminus to facilitate protein purification usingchromatography.

X. Proteins of Interest

As used herein, the term “proteins of interest” relates to proteins, orportions thereof, for which it is desired that the polynucleotideencoding the protein is optimized for SMH by AID in order to rapidlycreate, select and identify improved variants of that protein. Suchoptimized polynucleotide can be made more susceptible to SHM as a resultof codon usage, thereby inducing amino acid changes when thepolynucleotide is subjected to AID, and screened for improved function.Conversely, such optimized polynucleotide can be made more resistant toSHM, thereby decreasing amino acid changes when the polynucleotide issubjected to AID as a result of codon usage, and screened for improvedfunction.

It should be understood however that the present invention also includescompositions and methods that relate to polynucleotide sequences thatencode proteins that are resistant to SHM, or comprise codons that canbe converted to stop codons. Such synthetic genes confer certainadvantages in the present invention and are specifically disclosed, forexample, in Section IX.

Any protein for which the amino acid, or corresponding nucleotidesequence is known, or available (e.g. can be cloned into a vector of thepresent invention) and a phenotype or function can be improved is acandidate for use in the vectors and SHM systems provided herein.Proteins of interest include, for example, surface proteins,intracellular proteins, membrane proteins and secreted proteins from anyunmodified or synthetic source. Exemplary, but non-limiting types ofproteins for use in the vectors and SHM systems provided herein includean antibody heavy chain or portion thereof, an antibody light chain orportion thereof, an enzyme, a receptor, a structural protein, aco-factor, a polypeptide, a peptide, an intrabody, a selectable marker,a toxin, growth factor, peptide hormone, and any other protein which canbe optimized, is intended to be included.

Biologically active proteins (molecules) also include molecules capableof modulating the pharmacokinetics and/or pharmacodynamics of otherbiologically active proteins (molecules), for example, lipids andpolymers such as polyamines, polyamides, polyethylene glycol and otherpolyethers. For example, polypeptides are those such as, for example,VEGF, VEGF receptor, Diptheria toxin subunit A, B. pertussis toxin, CCchemokines (e.g., CCL1-CCL28), CXC chemokines (e.g., CXCL1-CXCL16), Cchemokines (e.g., XCL1 and XCL2) and CX₃C chemokines (e.g., CX₃CL1),IFN-gamma, IFN-alpha, IFN-beta, TNF-alpha, TNF-beta, IL-1, IL-2, IL-3,IL-4, IL-5, IL-6, IL-7, IL-10, IL-12, IL-13, IL-15, TGF-beta, TGF-alpha,GM-CSF, G-CSF, M-CSF, TPO, EPO, human growth factor, fibroblast growthfactor, nuclear co-factors, Jak and Stat family members, G-proteinsignaling molecules such as chemokine receptors, JNK, Fos-Jun, NF-κB,I-KB, CD40, CD4, CD8, B7, CD28 and CTLA-4.

Additionally, there are a variety of other component nucleotidesequences, such as coding sequences and genetic elements that can makeup the core system that one would, in some embodiments, prefer not tohypermutate to maintain overall system integrity. These componentnucleotide sequences include without limitation, i) selectable markerssuch as neomycin, blasticidin, ampicillin, etc; ii) reporter genes (e.g.fluorescent proteins, epitope tags, reporter enzymes); iii) geneticregulatory signals, e.g. promoters, inducible systems, enhancersequences, IRES sequences, transcription or translational terminators,kozak sequences, splice sites: origin of replication, repressors; iv)enzymes or accessory factors used for high level enhanced SHM, or it'sregulation, or measurement, such as AID, pol eta, transcription factors,and MSH2; v) signal transduction components (kinases, receptors,transcription factors) and vi) domains or sub domains of proteins suchas nuclear localization signals, transmembrane domains, catalyticdomains, protein-protein interaction domains, and other protein familyconserved motifs, domains and sub-domains.

One could, based on the present application, select a protein ofinterest as a suitable candidate for optimization, and devise a suitableassay to monitor the desired trait of the protein of interest.

Depending on the nature of the protein of interest, and amount ofinformation available on the protein of interest, a practioner canfollow any combination of the following strategies prior to mutagenesisto create the optimized polynucleotide.

1. No optimization: Although it can be desirable to enhance the numberof hot spots within the polynucleotide sequence encoding a protein ofinterest, it should be noted that any unmodified protein is expected toundergo a certain amount of SHM, and can be used in the presentinvention without optimization, or any specific knowledge of the actualsequence. Additionally certain proteins, for example antibodies,naturally comprise polynucleotide sequences which have evolved suitablecodon usage, and do not require codon modification. Alternatively, itcan be desirable to enhance the number of cold spots within thepolynucleotide sequence encoding a protein of interest (e.g., frameworkregions of antibodies or fragments thereof).

2. Global Hot spot optimization: In some aspects, the number of hotspotsin a polynucleotide encoding a protein can be increased, as describedherein. This approach can be applied to the entire coding region of thegene, thereby rendering the entire protein more susceptible to SHM. Asdiscussed herein, this approach can be preferred if relatively little isknown about structure activity relationships within the protein, orbetween related protein isotypes.

3. Selective hot spot modification: Alternatively, as discussed herein,a polynucleotide sequence encoding the protein of interest can beselectively, and or systematically modified through the targetedreplacement of regions of interest with synthetic variable regions,which provide for a high density of hot spots and seed maximal diversitythrough SHM at specific loci.

One of ordinary skill in the art would understand, based on the teachingprovided herein, that any or all of the above approaches can beundertaken using the present invention. Approaches relating to globalhot spot optimization, and selective hot spot modification, are howeverlikely to lead to faster and more efficient optimization of proteinfunction.

Following the design of an optimized polynucleotide encoding the proteinof interest, it can be synthesized using standard methodology andsequenced to confirm correct synthesis. Once the sequence of thepolynucleotide has been confirmed, the polynucleotide can be insertedinto a vector of the present invention, and the vector then introducedinto a host cell as described herein to effect mutagenesis.

Once introduced into a suitable host cell, cells can be induced toexpress AID, and/or other factors to initiate SHM, thereby inducingon-going sequence diversification of the protein of interest. After anappropriate period of time, (e.g., 2-10 cell divisions) the resultinghost cells, including variants of the protein of interest can bescreened and improved mutants identified and separated for the cellpopulation. This process can be iteratively repeated to selectivelyimprove the properties of the protein of interest.

A cell-surface displayed protein can be created through the creation ofa chimeric molecule of a protein of interest coupled in frame to asuitable transmembrane domain. In the case of mammalian cell expression,for example, a MHC type 1 transmembrane domain such as that from H2kk(including peri-transmembrane domain, transmembrane domain, andcytoplasmic domain; NCBI Gene Accession number AK153419) can be used.Likewise the surface expression of proteins in prokaryotic cells (suchas E. coli and Staphylococcus) insect cells, and yeast is wellestablished in the art. For reviews, see for example Winter, G. et al.,Arum. Rev. Immunol. (1994) 12:433-55; Plückthun, A., (1991)Bio/Technology 9: 545-551; Gunneriusson et al., (1996) J. Bacteriol 781341-1346; Ghiasi et al., (1991) Virology 185 187-194; Boder andWittrup, (1997) Nat. Biotechnol. 15 553-557; and Mazor et al., (2007)Nat. Biotech. 25(5) 563-565.

Surface displayed antibodies or proteins can be created through thesecretion and then binding (or association) of the secreted protein onthe cell surface. Conjugation of the antibody or protein to the cellmembrane can occur either during protein synthesis or after the proteinhas been secreted from the cell. Conjugation can occur via covalentlinkage, by binding interactions (e.g., mediated by specific bindingmembers) or a combination of covalent and non-covalent linkage.

In yet another aspect, proteins can be coupled to a cell through thecreation of an antibody or binding protein fusion protein comprising afirst specific binding member that specifically binds to a target ofinterest fused to a second binding member specific for display on a cellsurface (e.g., in the case of exploiting the binding of protein A and aFc domain: protein A is expressed on and attached to a cell surface andbinds to, and localizes, a secreted antibody (or a protein of interestexpressed as an Fc fusion protein)).

Transfection of appropriate expression vectors containing thecorresponding polynucleotide sequences into suitable mutator positivecells can be performed using any art recognized or known transfectionprotocol. An exemplary surface expressed library of proteins isdescribed in Examples 4 and 5 of priority U.S. Application Nos.60/904,622 and 61/020,124.

Cells expressing a plurality of antibodies or binding proteins from thetransfections above can, optionally, be characterized to select cellsexpressing specific ranges of surface expression of the protein on thecell surface using conventional assays including, but not limited to,FACS.

Staining of light and heavy chain expression can be accomplished, forexample, by using commercially available fluorescein Isothiocyanate(FITC) or R-Phycoerythrin (R-PE) conjugated rat anti-mouse Ig, kappalight chain, and FITC or R-PE conjugated rat anti-mouse Ig G1 monoclonalantibodies (BD Pharmingen). Staining can be performed using themanufacture's suggested protocols, usually via incubation of the testcells in the presence of labeled antibody for 30 minutes on ice.Expression levels of cellular antigen expression can be quantified usingSpherotech rainbow calibration particles (Spherotech, IL).

Transfected, cell populations exhibiting specific ranges of expressioncan be selected. For example, cells with a surface copy number ofgreater than about 10,000, about 50,000, about 100,000, or about 500,000proteins per cell can be selected, and can then be used for efficientaffinity profiling.

Populations of stably transfected cells can be created via, for example,growth for 2 to 3 weeks in the presence of appropriate selectableagents; the resulting cell library can be frozen and stored as a cellbank. Alternatively, cells can be transiently transfected and usedwithin a few days of transfection.

It may be desirable in some instances to convert a surface displayedprotein into a secreted protein for further characterization. Conversioncan be accomplished through the use of a specific linker that can becleaved by incubation with a selective protease such as factor X,thrombin or any other selective proteolytic agent. It is also possibleto include polynucleotide sequences that enable the genetic manipulationof the encoded protein in the vector (i.e., that allow excision of asurface attachment signal from the protein reading frame). For example,the insertion of one or more unique restriction sites, or cre/loxelements, or other recombination elements that enable the selectiveremoval of an attachment signal and subsequent intracellularaccumulation (or secretion) of the protein of interest at will. Furtherexamples include the insertion of flanking loxP sites around anattachment signal (such as a transmembrane domain) allowing forefficient cell surface expression of a protein of interest. However,upon expression of the cre recombinase in the cell, recombination occursbetween the LoxP sites resulting in the loss of the attachment signal,and thus leading to the secretion of the protein of interest.

Once a polypeptide has been optimized to a determined degree, the cellor population of cells expressing an optimized polypeptide of interestcan be isolated or enriched and the phenotype (function) of theoptimized polypeptide can be assayed using art-recognized assays.

Cells can then be re-grown, SHM re-induced, and re-screened over anumber of cycles to effect iterative improvements in the desiredfunction. At any point, the polynucleotide sequence encoding the proteinof interest can be rescued and/or sequenced to monitor on-goingmutagenesis.

For example, episomal plasmid DNA can be extracted (or amplified byco-expression with SV40 T Antigen (J. Virol. (1988) 62 (10) 3738-3746))and then extracted and amplified by PCR using DNA primers that arespecific for the polynucleotide or interest or flanking regions, usingstandard methodology. Alternatively, total RNA can be isolated fromvarious cell populations that have been isolated by flow cytometry ormagnetic beads; episomal DNA and/or total RNA and can be amplified byRT-PCR using primers that are specific for the polynucleotide orinterest or flanking regions using standard methodology. Clones can besequenced using automated DNA sequences from companies such as AppliedBiosystems (ABI-377 or ABI 3730 DNA sequencers). Sequences can beanalyzed for frequency of nucleotide insertions and deletions comparedto the starting sequence.

A. Antibodies and Fragments Thereof

With respect to antibodies, the present invention provides the abilityto bypass the need for immunization in vivo to select antibodies thatbind to key surface epitopes that are aligned with producing the mostrobust biological effects on target protein function. Additionally,mammalian antibodies intrinsically process optimal codon usage patternsfor targeted SHM, greatly simplifying template design strategies. Forcertain antigens, in vivo immunization leads to epitope selection thatdoes not impact target function, thereby hindering the selection ofpotent and efficacious antibody candidates. In still other embodiments,the present invention can provide for the rapid evolution ofsite-directed antibodies that have potent activity by nature of the roleof that epitope in determining target protein function. This providesthe ability to scan target proteins for optimal epitope position andproduce best in class antibodies drugs for use in the clinic.

As described herein, all naturally occurring germline, affinity matured,synthetic, or semi-synthetic antibodies, as well as fragments thereof,may be used in the present invention. In general, such antibodies can bealtered through SHM to improve one or more of the following functionaltraits: affinity, avidity, selectivity, thermostability, proteolyticstability, solubility, folding, immunotoxicity and expression. Dependingupon the antibody format, antibody libraries can comprise separate heavychain and light chain libraries which can be co-expressed in a hostcell. In certain embodiments, full length antibodies can be secreted,and/or surface displayed at the plasma membrane of the host cell. Instill other embodiments, heavy and light chain libraries can be insertedin to the same expression vector, or different expression vectors toenable simultaneous co-evolution of both antibody chains.

In one embodiment, increasing the hotspot density in specific subdomains of antibodies or fragments thereof (e.g., F(ab′)₂, Fab′, Fab,Fv, scFv, dsFv, dAb or a single chain binding polypeptide) can result inan improvement in a characteristic such as one or more of increasedbinding affinity, increased binding avidity and/or decreasednon-specific binding. In another embodiment, the use of syntheticantibodies with increased hotspots in the constant domain (e.g., Fc) canresult in increased binding affinity for an Fc receptor (FcR), therebymodulating signal cascades. Heavy chains and light chains, or portionsthereof, can be simultaneously modified using the procedures describedherein.

Intrabodies used in the methods provided herein can be modified toimprove or enhance folding of the heavy and/or light chain in thereducing environment of the cytoplasm. Alternatively, or in addition, asFv intrabody can be modified to stabilize frameworks that could foldproperly in the absence of intradomain disulfide bonds. Intrabodies canalso be modified to increase, for example, one or more of the followingcharacteristics: binding affinity, binding avidity, epitopeaccessibility, competition with endogenous proteins for the targetepitope, half-life, target sequestration, post-translationalmodification of the target protein, etc. Because intrabodies act withinthe cell, their activity is more analogous to assay methodologies forenzyme activity assays, which are discussed below in section B.

Polynucleotide Identification and Design

Methods for designing and creating targeted antibody libraries, as wellas methods for identifying optimal epitopes that provide for theselection of antibodies with superior selectivity, cross speciesreactivity, and blocking activity are known in the art and describedherein. Such methods are disclosed in sections IV and V of the presentspecification, as well as commonly owned priority U.S. PatentApplication Nos. 60/904,622 and 61/020,124.

Screening Methodology

Specific screens to detect and select surface exposed or secretedantibodies with improved traits, are well known in the art, and aredescribed in detail in section XI. Such screens can involve severalrounds of selection based on the simultaneous selection of multipleparameters, for example, affinity, avidity, selectivity andthermostability in order to evolve the overall best antibody.

Once an antibody or fragment thereof has been optimized using SHM, thephenotype/function of the optimized antibody or fragment thereof can befurther analyzed using art-recognized assays. Assays for antibodies orfragments thereof include, but are not limited to enzyme-linkedimmunosorbant assays (ELISA), enzyme-linked immunosorbent spot (ELISPOTassay), gel detection and fluorescent detection of mutated IgH chains,Scatchard analysis, BIACOR analysis, western blots, polyacrylamide gel(PAGE) analysis, radioimmunoassays, etc. which can determine bindingaffinity, binding avidity, etc. Such assays are more fully described inSection XI below.

Once optimized antibodies have been identified, episomal DNA can beextracted (or amplified by co-expression with SV40 T Antigen (J. Virol.(1988) 62 (10) 3738-3746)) and then extracted and subjected to PCR usingvariable heavy chain (V_(H)) leader region and/or variable light chain(V_(L)) leader region specific sense primers and isotype specificanti-sense primers. Alternatively, total RNA from selected sorted cellpopulations can be isolated subjected to RT-PCR using variable heavychain (V_(H)) leader region and/or variable light chain (V_(L)) leaderregion specific sense primers and isotype specific anti-sense primers.Clones can be sequenced using standard methodologies and the resultingsequences can be analyzed for frequency of nucleotide insertions anddeletions, receptor revision and V gene selection. The resulting datacan be used to populate a database linking specific amino acidsubstitutions with changes in one or more of the desired properties.Such databases can then be used to recombine favorable mutations or todesign next generation polynucleotide library with targeted diversity innewly identified regions of interest, e.g. nucleic acid sequences whichencode a functional portion of a protein.

B. Enzymes

Enzymes and pro-enzymes present another category of polypeptides whichcan be readily improved, and for which SHM is useful. Of particularinterest is the application of the present invention to the co-evolutionof multiple enzymatic pathways, involving the simultaneous mutation oftwo or more enzymes. Enzymes and enzyme systems of particular noteinclude, for example, enzymes associated with microbiologicalfermentation, metabolic pathway engineering, protein manufacture,bio-remediation, and plant growth and development.

Specific high throughput screening systems to measure, select and evolveenzymes with improved traits, are well known in the art, and areoutlined in Section XI. Such screens can involve several rounds ofselection based on the simultaneous selection of multiple parameters,for example, pH stability, Km, Kcat, thermostability, solubility,proteolytic stability, substrate specificity, co-factor dependency, andtendency for hetero or homo dimerization.

Polynucleotide Identification and Design

As described previously, the starting point for mutagenesis is either acDNA clone of the gene of interest, or it's amino acid or polynucleotidesequence. To maximize the effectiveness of SHM, the startingpolynucleotide sequence can be modified to maximize the density of hotspots and to reduce the density of cold spots. Such methods aredisclosed in sections IV and V of the present specification, as well ascommonly owned priority U.S. Patent Application Nos. 60/904,622 and61/020,124.

If particular regions of interest have been defined in the protein orproteins of interest, these areas can be targeted with preferred hotspot motifs, alternatively a scanning approach can be used tosystematically insert hot spot motifs throughout the reading frame ofthe enzyme or enzymes of interest, as described previously.

For the co-evolution of a particular enzymatic pathway involvingmultiple enzymes, a particular advantage of the present invention is theability to co-evolve all the enzymes simultaneously with a single cell.This approach exploits the ability to identify mutations that onlyconfer an advantage to the overall system when all of the members of thesystem are present. For example, mutations that induceheterodimerization between enzymes involved in consecutive enzymaticreactions.

Screening Methodology

Many high throughput screening approaches are well known in the art andcan be readily applied to identify and select improved enzymes (see,e.g., Olsen et al., Methods. Mol. Biol. (2003) 230: 329-349; Turner,Trends Biotechnol. (2003) 21 (11): 474-478; Zhao et al., Curr. Opin.Biotechnol. (2002) 13 (2): 104-110; and Mastrobattista et al., Chem.Biol. (2005) 12 (12): 1291-300). The screening modality used can dependon the nature of the enzyme and whether the enzyme of interest isintracellular, or extracellular, and further whether it is membraneassociated or freely secreted.

Initial screens that provide useful quantitative information over a widedynamic window, and which have a high screening capacity, are preferred.Representative screening approaches include, for example, assays basedon the altered ability, or speed of growth of improved cells, and/orbased on the sorting of cells using a flow cytometer. FACS basedapproaches can detect the presence of intracellular fluorogenic reactionproducts or altered reporter gene expression, and specific protocols forthe FACS based optimization of enzyme activity are reviewed in thefollowing references; Farinas et al., Comb. Chem. High Throughput Screen(2006) 9(4): 321-8; Becker et al., Curr. Opin. Biotechnol. (2004) 15(4):323-9; Daugherty et al., J. Immunol. Methods (2000) 243 (1-2): 211-227.

Once an enzyme or set of enzymes has been optimized using SHM, acomplete biochemical analysis of the optimized enzyme(s) can be furtheranalyzed using art-recognized assays. Additionally as previouslydiscussed, once optimized enzymes have been identified, episomal DNA canbe extracted or amplified by co-expression with SV40 T Antigen (J.Virol. (1988) 62 (10) 3738-3746), then extracted and subjected to PCRusing specific primers. Alternatively, total RNA can be obtained fromselected cell populations and subjected to RT-PCR using specificprimers. Clones can be sequenced using standard methodologies and theresulting sequences can be analyzed for the frequency of nucleotidemutations. The resulting data can be used to populate a database linkingspecific amino acid substitutions with changes in one or more of thedesired properties. Such databases may then be used to recombinefavorable mutations, or to design next generation polynucleotide librarywith targeted diversity in newly identified regions of interest, e.g.nucleic acid sequences which encode a functional portions of a protein.

C. Receptors

Receptors bind ligands and encompass a broad genus of unmodified andsynthetic polypeptides encoding specific binding members, including, butnot limited to, cell-bound receptors such as antibodies (B cellreceptors), T cell receptors, Fc receptors, G-coupled protein receptors,cytokine receptors, carbohydrate receptors, and Avimer™ based receptors.

In one embodiment, such receptors can be altered through SHM to improveone or more of the following traits; affinity, avidity, selectivity,thermostability, proteolytic stability, solubility, dimerization,folding, immunotoxicity, coupling to signal transduction cascades andexpression.

Polynucleotide Identification and Design

As described previously, the starting point for mutagenesis can beeither a cDNA clone of the gene of interest, or its amino acid orpolynucleotide sequence. To maximize the effectiveness of SHM, thestarting polynucleotide sequence can be modified to maximize the densityof hot spots and to reduce the density of cold spots. Such methods aredisclosed in sections IV and V of the present specification.

Such receptors possess clearly defined domains that can be eithertargeted for mutagenesis through the use of SHM optimized sequences, orconserved during mutagenesis through the use of SHM resistant sequences.Domains (regions) targeted for mutagenesis include, but are not limitedto, sites of post-translational modification, surface exposed loopdomains, positions of variation between species, protein-proteininteraction domains, and binding domains. Domains (regions) conservedduring mutagenesis include transmembrane domains, invariant amino acidpositions, signal sequences, and intracellular trafficking domains.Alternatively, a scanning approach can be used to systematically inserthot spot motifs throughout the reading frame of the receptor ofinterest, as described previously.

Screening Methodology

Many high throughput screening approaches are well known in the art andcan be readily applied to identify and select improved receptors.Representative screening approaches include, for example, bindingassays, growth assays, reporter gene assays and FACS based assays.

Once an enzyme or set of enzymes has been optimized using SHM, acomplete pharmacological analysis of the optimized receptor can befurther analyzed using art-recognized assays. Additionally as previouslydiscussed, once an optimized receptor has been identified, episomal DNAcan be extracted or amplified by co-expression with SV40 T Antigen (J.Virol. (1988) 62 (10) 3738-3746), then extracted and subjected to PCRusing specific primers. Alternatively, total RNA can be obtained fromselected cell populations and subjected to RT-PCR using specificprimers. Clones can be sequenced using standard methodologies and theresulting sequences can be analyzed for the frequency of nucleotidemutations. The resulting data can be used to populate a database linkingspecific amino acid substitutions with changes in one or more of thedesired properties. Such databases may then be used to recombinefavorable mutations or to design next generation polynucleotide librarywith targeted diversity in newly identified regions of interest, e.g.,nucleic acid sequences which encodes functional portions of a protein.

XI. Screening and Enrichment Systems

Polypeptides generated by the expression of the synthetic libraries,semi-synthetic libraries, or seed libraries of polynucleotides describedherein can be screened for improved phenotype using a variety ofstandard physiological, pharmacological and biochemical procedures. Suchassays include for example, biochemical assays such as binding assays,fluorescence polarization assays, solubility assays, folding assays,thermostability assays, proteolytic stability assays, and enzymeactivity assays (see generally Glickman et al., J. BiomolecularScreening, 7 No. 1 3-10 (2002); Salazar et al., Methods. Mol. Biol. 23085-97 (2003)), as well as a range of cell based assays including signaltransduction, motility, whole cell binding, flow cytometry andfluorescent activated cell sorting (FACS) based assays. Cells expressingpolypeptide of interest encoded by a synthetic or semi-synthetic libraryas described herein can be enriched any art-recognized assay including,but not limited to, methods of coupling peptides to microparticles.

Many FACS and high throughput screening systems are commerciallyavailable (see, e.g., Zymark Corp., Hopkinton, Mass.; Air TechnicalIndustries, Mentor, Ohio; Beckman Instruments Inc., Fullerton, Calif.;Precision Systems, Inc., Natick, Mass.) that enable these assays to berun in a high throughput mode. These systems typically automate entireprocedures, including all sample and reagent pipetting, liquiddispensing timed incubations, and final readings of the microplate indetector(s) appropriate for the assay. These configurable systemsprovide high throughput and rapid start up as well as a high degree offlexibility and customization. The manufacturers of such systems providedetailed protocols for various high throughput systems. Thus, forexample, Zymark Corp. provides technical bulletins describing screeningsystems for detecting the modulation of gene transcription, ligandbinding, and the like.

A. Cell-based Methods to Measure Activities.

1. Signal Transduction Based Assays

Proteins such as, for example, growth factors, enzymes, receptors andantibodies can influence signal transduction within a cell or cellpopulation, and thereby influence transcriptional activity that can bedetected using a reporter gene assay. Such modulators can behavefunctionally as full or partial agonists, full or partial antagonists,or full or partial inverse agonists.

Thus in one assay format, signal transduction assays can be based on theuse of cells comprising a reporter gene whose expression is directly orindirectly regulated by the protein of interest, which can be measuredby a variety of standard procedures.

Reporter plasmids can be constructed using standard molecular biologicaltechniques by placing cDNA encoding for the reporter gene downstreamfrom a suitable minimal promoter (that is, any sequence that supportstranscription initiation in eukaryotic cells) that sits 5′ to the codingsequence of the reporter gene. A minimal promoter can be derived from aviral source such as, for example: SV40 early or late promoters,cytomegalovirus (CMV) immediate early promoters, or Rous Sarcoma Virus(RSV) early promoters; or from eukaryotic cell promoters, for example,beta actin promoter (Ng, Nuc. Acid Res. 17:601-615, 1989; Quitsche etal., J. Biol. Chem. 264:9539-9545, 1989), GADPH promoter (Alexander, M.C. et al., Proc. Nat. Acad. Sci. USA 85:5092-5096, 1988, Ercolani, L. etal., J. Biol. Chem. 263:15335-15341, 1988), TK-1 (thymidine kinase)promoter, HSP (heat shock protein) promoters, or any eukaryotic promotercontaining a TATA box.

A reporter plasmid also typically includes an element 5′ to the minimalpromoter that contains a consensus recognition sequence, usuallyrepeated 2 to 7 times in a concatenate, to the appropriate branch of thesignal transduction pathway for which monitoring is desired. Examplesinclude, but are not limited to: cyclic AMP response elements (CRE,which responds to changes in intracellular cAMP concentrations,available from Stratagene in phagemid vector pCRE-Luc, Cat. No. 219076),serum response elements (SRE, Stratagene phagemid vector pSRE-Luc. Cat.No. 219080), nuclear factor B response elements (NF-kB, Stratagenephagemid vector pNFKB-Luc Cat. No. 219078), activator protein 1 responseelements (AP-1, Stratagene phagemid vector pAP-1-Luc, Cat. No. 219074),serum response factor response elements (Stratagene phagemid vectorpSRF-Luc, Cat. No. 219082), or p53 binding sites.

Numerous reporter gene systems are known in the art and include, forexample, alkaline phosphatase Berger, J., et al. (1988) Gene 66 1-10;Kain, S. R. (1997) Methods. Mol. Biol. 63 49-60), .beta.-galactosidase(See, U.S. Pat. No. 5,070,012, issued Dec. 3, 1991 to Nolan et al., andBronstein, I., et al., (1989) J. Chemilum. Biolum. 4 99-111),chloramphenicol acetyltransferase (See Gorman et al., Mol Cell Biol.(1982) 2 1044-51), .beta.-glucuronidase, peroxidase, beta-lactamase(U.S. Pat. Nos. 5,741,657 and 5,955,604), catalytic antibodies,luciferases (U.S. Pat. Nos. 5,221,623; 5,683,888; 5,674,713; 5,650,289;5,843,746) and naturally fluorescent proteins (Tsien, R. Y. (1998) Annu.Rev. Biochem. 67 509-44).

Alternatively, intermediate signal transduction events that are proximalto gene regulation can also be observed, such as, by measuringfluorescent signals from reporter molecules that respond tointracellular changes including, but not limited to, fluctuations incalcium concentration due to release from intracellular stores,alterations in membrane potential or pH, increases in inositoltriphosphate (IP₃) or cAMP concentrations, or release of arachidonicacid.

As used herein, agonists refer to modulators that stimulate signaltransduction and can be measured using various combinations of theconstruct elements listed above. As used herein, partial agonists referto modulators able to stimulate signal transduction to a level greaterthan background, but less than 100% as compared to a full agonist. Asuperagonist is able to stimulate signal transduction to greater than100% as compared to a full agonist reference standard.

As used herein, antagonists refer to modulators that have no influenceon signal transduction on their own, but are able to inhibit agonist-(or partial agonist-) induced signaling. As used herein, partialantagonists refer to modulators that have no influence on signaltransduction on their own, but are able to inhibit agonist- (or partialagonist-) induced signaling to an extent that is measurable, but lessthan 100%.

As used herein, inverse agonists refer to modulators that are able toinhibit agonist- (or partial agonist-) induced signaling, and are alsoable to inhibit signal transduction when added alone.

2. Motility Assays

Agonistic activity on several categories of cell surface molecules(e.g., GPCR's such as chemokine receptors, histamine H4, cannabinoidreceptors, etc.) can lead to cell movements. Thus, partial or fullagonist or antagonist activities of test molecules can be monitored viaeffects on cell motility, such as in chemotaxis assays (Ghosh et al.,(2006) J Med. Chem. May 4; 49(9):2669-2672), chemokinesis (Gillian etal., (2004) ASSAY and Drug Development Technologies. 2(5): 465-472) orhaptotaxis (Hintermann et al., (2005) J. Biol. Chem. 280(9): 8004-8015).

3. Whole Cell Binding Assays

Binding assays that utilize receptors, membrane associated antibodies,and cell surface proteins can be performed using whole cells (as opposedto membrane preparations) in order to monitor activity or bindingselectivity of proteins of interest. Such assays can also be used todirectly select desired cell populations via the use of FACS.(Fitzgerald et al., (1998) J Pharmacol Exp Ther. 1998 November;287(2):448-456; Baker, (2005) Br J. Pharmacol. February; 144(3):317-22)

A large number of fluorescently tagged compounds are available toperform whole cell binding assays. In addition, specific peptides can bereadily labeled in order to profile the binding affinity and selectivityof membrane associated antibodies. In general peptides can be conjugatedto a wide variety of fluorescent dyes, quenchers and haptens such asfluorescein, R-phycoerythrin, and biotin. Conjugation can occur eitherduring peptide synthesis or after the peptide has been synthesized andpurified.

Biotin is a small (244 kilodaltons) vitamin that binds with highaffinity to avidin and streptavidin proteins and can be conjugated tomost peptides without altering their biological activities.Biotin-labeled peptides are easily purified from unlabeled peptidesusing immobilized streptavidin and avidin affinity gels, andstreptavidin or avidin-conjugated probes can be used to detectbiotinylated peptides in, for example, ELISA, dot blot or Western blotapplications.

N-hydroxysuccinimide esters of biotin are the most commonly used type ofbiotinylation agent. N-hydroxysuccinimide-activated biotins reactefficiently with primary amino groups in physiological buffers to formstable amide bonds. Peptides have primary amines at the N-terminus andcan also have several primary amines in the side chain of lysineresidues that are available as targets for labeling withN-hydroxysuccinimide-activated biotin reagents. Several differentN-hydroxysuccinimide esters of biotin are available, with varyingproperties and spacer arm length (Pierce, Rockford, Ill.). Thesulfo-N-hydroxysuccinimide ester reagents are water soluble, enablingreactions to be performed in the absence of organic solvents.

Alternatively, peptides can be conjugated with R-Phycoerythrin, a redfluorescent protein. R-Phycoerythrin is a phycobiliprotein isolated frommarine algae. There are several properties that make R-Phycoerythrinideal for labeling peptides, including an absorbance spectra thatincludes a wide range of potential excitation wavelengths, solubility inaqueous buffers and low nonspecific binding. R-Phycoerythrin also has ahigh fluorescence quantum yield (0.82 at 578 nanometers) that istemperature and pH independent over a broad range. Conjugating peptideswith R-Phycoerythrin can be accomplished using art-recognized techniquesdescribed in, for example, Glazer, A N and Stryer L. (1984). Phycofluorprobes. Trends Biochem. Sci. 9:423-7; Kronick, M N and Grossman, P D(1983) Immunoassay techniques with fluorescent phycobiliproteinconjugates. Clin. Chem. 29:1582-6; Lanier, L L and Loken, M R (1984)Human lymphocyte subpopulations identified by using three-colorimmunofluorescence and flow cytometry analysis: Correlation of Leu-2,Leu-3, Leu-7, and Leu-11 cell surface antigen expression. J. Immunol.,132:151-156; Parks, D R et al. (1984) Three-color immunofluorescenceanalysis of mouse B-lymphocyte subpopulations. Cytometry 5:159-68;Hardy, R R et al. (1983) demonstration of B-cell maturation in X-linkedimmunodeficient mice by simultaneous three-color immunofluorescence.Nature 306:270-2; Hardy R R et al. (1984) J. Exp. Med. 159:1169-88; andKronick, M N (1986) The use of phycobiliproteins as fluorescent labelsin immunoassay. J. Immuno. Meth. 92:1-13.

A number of cross-linkers can be used to produce phycobiliproteinconjugates including, but not limited to, N-Succinimidyl3-[2-pyridyldithio]-propionamido, (Succinimidyl6-(3-[2-pyridyldithio]-propionamido)hexanoate, or (Sulfosuccinimidyl6-(3-[pyridyldithio]-propianamido)hexanoate. Such cross-linkers reactwith surface-exposed primary amines of the phycobiliprotein and createpyridyldisulfide group(s) that can be reacted with peptides that containeither free sulfhydryl groups or primary amines.

Another option is to label peptides with fluorescein isothiocyanate(molecular weight 389). The isothiocyanate group on the fluorescein willcross-link with amino, sulfhydryl, imidazoyl, tyrosyl or carbonyl groupson peptides, but generally only derivatives of primary and secondaryamines yield stable products. Fluorescein isothiocyanate has anexcitation and emission wavelengths at 494 and 520 nanometersrespectively and a molar extinction coefficient of 72,0000 M⁻¹ cm⁻¹ inan aqueous buffer at pH 8 (Der-Balian G, Kameda, N. and Rowley, G.(1988) Fluorescein labeling of Fab while preserving single thiol. Anal.Biochem. 173:59-63).

4. Whole Cell Activity Assays

Many proteins, including enzymes, intrabodies and receptors can bedirectly assayed within a living cell, or when surface displayed on thesurface. Typically for successful FACS based screening a fluorescent orfluorogenic membrane permeant substrate is required, many such reagentsare commercially available, for example from Molecular Probes(Invitrogen, CA). An increase in enzyme activity typically results inincreased production of a fluorescent product that is trapped within thecell resulting in cells with more fluorescence which can be separatedfrom less fluorescent cells, for example by FACS. Additionally many highthroughput microplate screens exist for screening of protein librariesthat exploit virtually any existing assay of enzymatic activity, seegenerally, Geddie, et al., Meth. Enzymol. 388 134-145 (2004).

5. Cell Growth Assays

The expression, or activity of a variety of proteins such as, forexample, growth factors, enzymes, receptors and antibodies can influencethe rate of growth of a host cell which be exploited either as an assay,or as a means of separating improved proteins.

Thus in one assay format, cells can be diluted to a limiting dilutionand cells which grow more rapidly detected and selected. In one aspectsuch growth based assays can involve the ability to grow in the presenceof a new substrate for which an improved enzymatic pathway of metabolismis required, for example a new carbon source. In another embodiment,growth assays can involve selection in the presence of a toxin, where ade-activation mechanism for the toxin is required. In another case,growth can be desired in response to the presence of a specific ligand,where high affinity binding of the ligand is required.

B. Selection and Enrichment Strategies

1. Flow Cytometry and FACS

Flow cytometry and the related flow sorting (also known as fluorescenceactivated cell sorting, or FACS) are methods by which individual cellscan be quantitatively assayed for the presence of a specific componentor component variant based upon staining with a fluorescent reporter.Flow cytometry provides quantitative, real time analysis of livingcells, and can achieve efficient cell sorting rates of 50,000cells/second, and is capable of selecting individual cells or definedpopulations. Many commercial FACS systems are available, for example BDBiosciences (CA), Cytopeia (Seattle, Wash.) Dako Cytomation (Australia).

A FACS can be equipped with a variety of lasers, which can produce awide range of available wavelengths for multiple parameter analysis, andfor use with different fluorophores. Classically the water cooled ionlasers using argon, krypton, or a mix of both can produce severalspecific lines; 408 nm, 568 nm, and 647 nm for example are majoremission lines for Krypton; 488 nm, 457 nm, and others are argon lines.These lasers require high voltage multiphase power and cooling water,but can produce high power outputs. Additionally tunable and non tunablediode lasers exist, for example a 408 nm line can be stably created viaa light emitting diode (LED) and this can be easily added to a sorter.Additionally dye lasers can be used to further extend the range ofavailable wavelengths available for FACS analysis.

During FACS analysis, cells are stained with the specific reporter andthen hydrodynamically focused into a single cell steam for interrogationwith a laser which excites the fluorescent moiety. Fluorescent emissionis detected through a wavelength restricted optical pathway andconverted to numeric data correlated to an individual cell. In the caseof flow sorting, predefined subsets of emission criteria can be met andthe cells of interest diverted into a collection receptacle for furtheruse by electrostatic repulsion or mechanical action (Herzenberg L A,Sweet R G, Herzenberg L A: Fluorescence activated cell sorting, Sci Amer234(3):108, March 1976).

FACS based approaches are compatible with signal transduction basedassays, activity based assays, and binding assays, and with a widevariety of proteins of interest, including for example, antibodies,receptors, enzymes and any surface displayed protein. FACS can beefficiently applied to most mammalian, yeast and bacterial cells, aswell as fluorescently tagged beads.

In one embodiment, FACS can be used to screen a library of cellsexpressing surface displayed proteins (e.g., surface displayedantibodies) that are undergoing, or have undergone, SHM mediateddiversity. In this approach, a cell surface displayed library is usedand the displayed proteins are first incubated with fluorescently taggedantigen in solution. The FACS instrument is able to separate the highaffinity protein members of the library, which have greater fluorescenceintensity, from the lower affinity members. The use of optimized bindingprotocols in conjunction with FACS based selection has been shown to becapable of evolving antibodies with up to femtomolar affinities, See,e.g., Boder et al. PNAS, (2000) 97: 10701-10705; Boder et al., (2000)Meth. Enzymol. (2000) 328: 430-444; VanAntwerp et al., Biotechnol. Prog.(2000) 16: 31-37).

In order to effectively select and rapidly evolve, the antibodies andbinding proteins which have high affinity to an antigen of interest,protocols can be established that can facilitate the isolation ofantibodies with a broad range of affinities to the antigens of interest,and yet eliminate proteins that bind to labeling or coupling reagents.These protocols involve both a progression in the stringency of the cellpopulation selected, and a decrease in the concentration and density ofthe target antigen presented to the cells.

With respect to the stringency or fraction of the total cell populationcollected during each round of selection, initial screens will generallyuse relatively low discrimination factors in order to capture as manyproteins as possible that possess small incremental improvements inbinding characteristics. For example, a typical initial sort may capturethe top 10%, top 5% or top 2% of all cells that bind a target. Largeimprovements in affinity may be the result of combinations of mutations,each of which contribute small additive effects to overall affinity.(Hawkins et al., (1993) J. Mol. Biol. 234: 958-964). Therefore, recoveryof all library clones with even marginally improved affinities (2-3fold) is desirable during the early stages of library screening, andsorting gates can be optimized to recover as many clones as possiblewith minimum sacrifice in enrichment.

These selected cells can subsequently be allowed to recover and grownusing standard culture conditions for a number of days until thepopulation has reached a reasonable number to allow for a subsequentround of FACS sorting, analysis, mutagenesis, cell banking, or todetermine sequence information. As discussed below, subsequent rounds ofselection to identify higher affinity binders can be achieved byprogressively decreasing the density and concentration of labeledbinding peptide used in the preincubation steps prior to FACS analysis.

Following a successful first round of sorting, the collected cells canbe re-grown to amplify the population and then resorted. At this, andsubsequent stages of sorting, greater enrichments are possible sincemore copies of each desirable clone are present within the examined cellpopulation. For example only about the top 1%, top 0.5%, top 0.2%, ortop 0.1% of the cells in the population may be selected in order toidentify significantly improved clones. With respect to establishingoptimal binding and selection strategies, first generation hits,including germline antibodies, typically have low affinities andrelatively rapid off rates. For example, Sagawa et al. (Mol. Immunology,39: 801-808 (2003)) observed that the apparent affinity for germline Absis typically in the range of 2×10⁴ to 5×10⁶M⁻¹, but that this affinityincreases to around 10⁹M⁻¹ during affinity maturation (i.e., an effectthat is mediated primarily by decreasing the off rate (K_(off))).

The binding characteristics of weak binding antibodies may slow thescreening of early generation, non-optimized libraries because specific,but low affinity binding antibodies typically have rapid off rates andtend therefore tend to be lost during wash steps. Loss of these specificbinders may result in the isolation of antibodies that bindnon-specifically to components used in the selection process (Cumbers etal., Nat. Biotechnol. 2002 November; 20(11): 1129-113).

To maximize the selection of proteins with relatively low affinities(i.e., having a Kd greater than about 500 nM), binding interactions arestabilized to prevent the dissociation of binding peptides during thescreening process, and include appropriate blocking reagents toeliminate binding to coupling reagents and support matrices. To achievethis goal, initial screens should use fluorescently tagged beads loadedwith a high density of antigens to exploit avidity effects, based on theuse of multiple binding interactions to increase the binding strength oflow affinity interactions, while also including pre-incubations withcoupling and labeling reagents such as streptavidin, avidin, and nakedbeads etc., to eliminate non-specific binding (see generally, Aggarwalet al., (2006) Bioconjugate Chem. 17 335-340; Wrighton et al., (1996)Science 273 458-64; Terskikh et al. (1997) PNAS 94 1663-8; Cwirla etal., (1997) Science 276 1696-9; and Wang et al. (2004) J. Immunologicalmethods 294 23-35).

By careful control of bead loading density, washing and pre-incubationconditions it has been demonstrated that even such low affinity bindinginteractions can be reproducibly monitored, (Werthen et al., (1993)BBA-326-332). Importantly these improvements to binding efficiency havebeen demonstrated to occur without any significant increase innon-specific reactivity (Giordano et al., (2001) Nat. Med. 7 1249-53).As discussed above, selections generally will also be based on using arelatively low stringency cut off during FACS to ensure that all ofthese weak binding library members are selected.

To further eliminate non-specific members of the library (i.e., thosethat bind to the beads, or coupling reagents, rather than the bindingpeptides), the resultant cell populations are screened directly witheither polymeric binding peptide or intact polymeric antigen usingdistinct coupling reagents (e.g., via the use of biotinylated antigencoupled to streptavidin-fluorophore conjugate to form anantigen-streptavidin fluorescent complex). Coupling or labeling of thebinding peptide to biotin or fluorophores can be achieved usingstandard, art-recognized protocols, as described herein and in theExamples.

Streptavidin binds biotin with femtomolar affinity and forms tetramersin physiological conditions, thereby generating a tetravalent complexwhen preincubated with singly biotinylated antigen (which issubsequently termed a streptavidin microaggregate as described below).Streptavidin pre-loading can increase the effective antigenconcentration up to 500-fold, and is useful for isolating weak antigenbinders that bind specifically to the antigen. Employment ofstreptavidin microaggregates is useful for isolating antibodies rangingin affinity from very weak to moderate (Kd greater than about 200 nM)affinities. Furthermore, biotinylated epitopes can be pre-reacted withstreptavidin-fluorophore at room temperature for 10 to 15 minutes inorder to create microaggregates prior to contacting cell populations.The microaggregates are subsequently allowed to contact cellssimultaneously for 15 to 30 minutes prior to addition of secondaryreagents, such as anti-human IgG-fluorophore conjugates. In oneexperimental approach, cells are centrifuged at 1500×g for 5 minutes andresuspended in a small volume (typically 500 μL to 1 mL) of DAPI (PBS,1% BSA, 2 μg/mL DAPI). In a second approach termed “homogeneous assayconditions,” cells are resuspended directly in DAPI into whichantigen-streptavidin microaggregate and goat-anti-human IgG-fluorophoreare added. This second approach is particularly desirable for moreweakly interacting antibodies (Kd greater than about 200 nM), whereminimizing dissociation time may be more relevant.

At higher affinities (with Kd>10 nM, but less than about 100 nM),libraries are more easily screened directly for improved affinity byincubating the library with monomeric binding peptide or full lengthtarget protein under equilibrium binding conditions at a concentrationof binding peptide that is ideally less than the Kd of the starting(wild type) interaction (apparent Kds can be readily determined by aseries of analytical FACS experiments conducted with a range of antigenconcentrations, ahead of a sort). Under these conditions, cells thatpossess antibodies and binding proteins with higher affinities willpossess significantly more fluorescently labeled binding peptide thanweaker binders, allowing the most fluorescent cells in the population tobe easily selected for further optimization. Typically, FACS sortinggates can be established that select about the top 0.5% to about 0.1% ofcells. In one non-limiting method, about the top 0.2% of cells areselected.

As recognized by Boder and Wittrup (Biotechnol. Prog. (1998) 14 55-62),the screening of very high affinity protein-ligand interactions (Kd<10nM) can be accomplished by screening for decreased off-rate rather thandirectly for affinity. In this approach, cells are labeled to saturationwith fluorescent binding peptide, followed by addition of an excess ofnon-fluorescent ligand. Cell associated fluorescence decaysexponentially with time approaching a background level and thedissociation reaction is stopped after a fixed duration, usually byextensive dilution with cold buffer. The duration of the competitionreaction determines the difference in observed fluorescence fordifferent library clones and, thus, determines the range of kineticimprovements likely to be selected from the library. For a competitivedissociation reaction, the presence of excess non-fluorescent ligand canyield an effective forward reaction rate of zero. Mean fluorescenceintensity at a given time after the initiation of the competitionreaction is a function of the off-rate (K_(off)). (VanAntwerp & Wittrup(2000) Biotechnol. Prog. 16 31-37; Boder et al. (2000) PNAS 9710701-10705; and Foote and Eisen (2000) PNAS 97 10679-10681). Cells inthe population that express antibodies with improved affinities and morestable binding can be systematically identified by progressivelyincreasing the length of time for the competition reaction, and thenselecting the most fluorescent cells remaining in the population forfurther optimization.

Under these conditions, cells that possess surface displayed antibodiesand binding proteins with higher affinities will exhibit significantlymore bead or streptavidin-biotinylated antigen microaggregate bindingcompared to cells that express proteins with little or no binding. Themost fluorescently labeled cells (displaying proteins with the highestaffinity) can then be separated from the rest of the cells in thepopulation using standard FACS sorting protocols, as described, forexample, in Example 9.

Once a selected cell population has been created that expresses aprotein that exhibits reproducible binding to a binding peptide, it canbe characterized with two or more intact proteins to confirm that theantibodies or binding proteins exhibit the desired pattern ofcross-reactivity and/or specificity (e.g., to both mouse and humanvariants of the protein of interest), or to two different members of arelated gene family, but not to an unrelated, or more distantly related,protein.

In one embodiment, this can be accomplished using multi-parameter FACSusing two or more proteins species labeled with two differently coloreddetectable tags (e.g., FITC and phycoerythrin) which can besimultaneously analyzed in a flow cytometer. Using this approach, it ispossible to identify cells that display binding to only one protein, orare capable of binding to both proteins. The population of cells thatexhibits the required dual specific binding can be selected by the FACSoperator based upon the number of cells sorted and the percentage ofcells identified that exhibit polyspecificity. As described previously,these selected cells can subsequently be allowed to recover and grownusing standard culture conditions for a number of days until thepopulation has reached a reasonable number to enable either a subsequentround of FACS sorting, analysis, cell banking, or to determine sequenceinformation.

Selected binders from the library can be further characterized asdescribed herein, and the sequence of the antibody or binding proteindetermined after PCR of cellular DNA, RT-PCR of RNA isolated from theselected cell population, or episome rescue.

Candidate antibodies and binding proteins can be iteratively subjectedto rounds of hypermutation and selection in order to evolve populationsof cells expressing antibodies or binding proteins with enhanced bindingproperties as described herein. Cells that preferentially and/orselectively bind to the binding peptide with a higher affinity areselected and allowed to expand. If needed, another round of mutagenesisis repeated and, again, cells that exhibit improved, selective, and highaffinity binding, are retained for further propagation and growth. Thenew improved variants obtained can be further characterized as describedherein, and the sequence of the heavy and light chains determined afterRT-PCR or episome rescue.

Mutations that are identified in the first one, two or three rounds ofhypermutation/selection can be recombined combinatorially into a set ofnew templates within the original parental backbone context, and all, ora subset of the resulting templates, can be subsequently transfectedinto cells which are then selected by FACS sorting. The bestcombination(s) of mutations are thus isolated and identified, and eitherused in a subsequent round of hypermutation/selection, or if the newlyidentified template(s) demonstrate sufficiently potent affinity, areused instead in experiments for further functional characterization.

In another embodiment, FACS can be used to screen a library of cellsexpressing intracellular proteins that are undergoing, or haveundergone, SHM mediated diversity creation. In this approach, a membranepermeable fluorogenic, or florescent reagent is used and firstpre-incubated with the library of cells to allow uptake and conversionof the reagent. The FACS instrument is able to separate the highactivity protein members of the library, which are able to convert agreater percentage of the reagent and are more fluorescent than cellscomprising lower activity members. (See, e.g., Farinas, Comb. Chem. HighThroughput Screen. (2006) 9: (4) 321-328).

Fluorescent moieties to be detected include, but are not limited to,compounds such as fluorescein (commonly called FITC), phycobiliproteinssuch as phycoerythrin (PE) and allophycocyanin (APC) (Kronick, M. N. J.Imm. Meth. 92:1-13 (1986)), fluorescent semiconductor nanocrystals suchas Quantum dot (QDot) bioconjugates for ultrasensitive nonisotopicdetection (Chan W C, Nie S. Science 281: 2016-8 (1998)), and coumarinderivatives such as Fluorescent Acylating Agents derived from7-Hydroxycoumarin.

Fluorescence can also reported from fluorescent proteins such as TealFluorescent Protein (TFP), from chemical stains of cellular componentssuch as DAPI bound to DNA, from fluorescent moieties covalentlyconjugated to antibodies that recognize cellular products, fromfluorescent moieties covalently conjugated to ligands of cellularreceptors, and from fluorescent moieties covalently conjugated tosubstrates of cellular enzymes.

Cells stained with membrane impermeant reporters, such as antibodies,can be sorted for subsequent processing to recover components such asgenes, episomes, or proteins of interest. Cells stained for surfaceexpression components or stained with cell membrane permeant reporterscan also be sorted intact for propagation.

2. Affinity Separation

Affinity separation based on the use microparticles enables theseparation of surface displayed proteins based on affinity to a specificcompound or sequence of interest. This approach is rapid, can easily bescaled up, and can be used iteratively with living cells.

Paramagnetic polystyrene microparticles are commercially available(Spherotech, Inc., Libertyville, Ill.; Invitrogen, Carlsbad, Calif.)that couple compounds or peptides to microparticle surfaces that havebeen modified with functional groups or coated with various antibodiesor ligands such as, for example, avidin, streptavidin or biotin.

In one aspect paramagnetic beads can be used in which the paramagneticproperty of microparticles allows them to be separated from solutionusing a magnet. The microparticles can be easily re-suspended whenremoved from the magnet thereby enabling the selective separation ofcells that find to the attached probe.

In one embodiment, peptides can be coupled to paramagnetic polystyrenemicroparticles coated with a polyurethane layer in a tube. The hydroxylgroups on the microparticle surface are activated by reaction withp-toluensulphonyl chloride (Nilsson K and Mosbach K. “p-Toluenesulfonylchloride as an activating agent of agarose for the preparation ofimmobilized affinity ligands and proteins.” Eur. J. Biochem. 1980:112:397-402). The resulting sulphonyl ester can subsequently reactcovalently with peptide amino or sulfhydryl groups. The peptides arequickly absorbed onto the surface of the activated microparticlesfollowed by the formation of covalent amine bonds with furtherincubation. The microparticles (2″ microparticles/milliliter) are washedtwo times by placing the tube containing 1 milliliter (ml) ofmicroparticles on a magnet, allowing the microparticles to migrate tothe magnet side of the tube, removing the supernatant, and re-suspendingthe microparticles in 1 ml of 100 millimolar (mM) borate buffer, pH 9.5.After washing, the microparticles are re-suspended in 100 mM boratebuffer, pH 9.5 at a concentration of 1⁰⁹ microparticles/ml. Elevennanomoles of peptide are added to the microparticles and themicroparticle/peptide mixture is vortexed for 1 minute to mix. Themicroparticles are incubated with peptides at room temperature for atleast 48 hours with slow tilt rotation. To ensure an optimal orientationof the peptide on the microparticles, bovine serum albumin (BSA) isadded to the microparticle/peptide mixture to a final concentration of0.1% (weight/volume) after incubation has proceeded for 10 minutes.After incubation, the tube containing the microparticle/peptide mixtureis placed on the magnet until the microparticles migrate to the magnetside of the tube. The supernatant is removed and the microparticles arewashed four times with 1 ml phosphate buffered saline solution (PBS), pH7.2 containing 1% (weight/volume) BSA. Finally, the microparticles arere-suspended in 1 ml PBS solution, pH 7.2 containing 1% (weight/volume)BSA.

Alternatively, paramagnetic polystyrene microparticles containingsurface carboxylic acid can be activated with a carbodiimide followed bycoupling to a peptide, resulting in a stable amide bond between aprimary amino group of the peptide and the carboxylic acid groups on thesurface of the microparticles (Nakajima N and Ikade Y, Mechanism ofamide formation by carbodiimide for bioconjugation in aqueous media,Bioconjugate Chem. 1995, 6(1), 123-130; Gilles M A, Hudson A Q andBorders C L Jr, Stability of water-soluble carbodiimides in aqueoussolution, Anal Biochem. 1990 Feb. 1; 184(2):244-248; Sehgal D and Vijay11C, a method for the high efficiency of water-solublecarbodiimide-mediated amidation, Anal Biochem. 1994 April; 218(1):87-91;Szajani B et al, Effects of carbodiimide structure on the immobilizationof enzymes, Appl Biochem Biotechnol. 1991 August; 30(2):225-231). Themicroparticles (2 microparticles/milliliter) are washed twice with 1 mlof 25 mM 2-[N-morpholino]ethane sulfonic acid, pH 5 for 10 minutes withslow tilt rotation at room temperature. The washed microparticles arere-suspended in 700 microliters (μL) 25 mM 2-[N-morpholino]ethanesulfonic acid, pH 5 followed by the addition of 21 nanomoles of peptidere-suspended in 25 mM 2-[N-morpholino]ethane sulfonic acid, pH 5 to themicroparticle solution. The microparticle/peptide mixture is mixed byvortexing and incubated with slow tilt rotation for 30 minutes at roomtemperature. After this first incubation, 300 μL of ice-cold 100milligram (mg)/mL 1-ethyl-3-(3-dimethylaminopropyl) carbodiimidehydrochloride re-suspended in 25 mM 2-[N-morpholino]ethane sulfonicacid, pH 5 is added to the peptide/microparticle mixture and incubatedovernight at 4° Celsius with slow tilt rotation. The peptide-coupledmicroparticles are washed four times with 1 ml 50 mM Tris pH 7.4/0.1%BSA for 15 minutes at room temperature with slow tilt rotation. Afterwashing, the peptide-coupled microparticles are re-suspended at aconcentration of 1⁹ microparticles/ml in PBS solution, pH 7.2 containing1% (weight/volume) BSA.

Another option is to couple biotinylated peptides to paramagneticpolystyrene microparticles whose surfaces have been covalently linkedwith a monolayer of streptavidin. Briefly, one ml of the streptavidinmicroparticles are transferred to a microcentrifuge tube and washed fourtimes by placing the tube on a magnet and allowing the microparticles tocollect on the magnet side of the tube. The solution is then removed andthe microparticles are gently re-suspended in 1 ml of PBS solution, pH7.2 containing 1% (weight/volume) BSA. After the final wash, themicroparticles are re-suspended in 1 ml of PBS solution, pH 7.2containing 1% (weight/volume) BSA; and 33 picomoles of biotinylatedpeptide are added to the microparticle solution. Themicroparticle/peptide solution is incubated for 30 minutes at roomtemperature with slow tilt rotation. After coupling, the unboundbiotinylated peptide is removed from the microparticles by washing fourtimes with PBS solution, pH 7.2 containing 1% (weight/volume) BSA. Afterthe final wash, the microparticle/peptide mixture is re-suspended to afinal bead concentration of 1⁹ microparticles/ml. (Argarana C E, Kuntz ID, Birken S, Axel R, Cantor C R. Molecular cloning and nucleotidesequence of the streptavidin gene. Nucleic Acids Res. 1986;14(4):1871-82; Pahler A, Hendrickson W A, Gawinowicz Kolks M A, AraganaC E, Cantor C R. Characterization and crystallization of corestreptavidin. Biol Chem 1987:262(29):13933-7)

The identification, selection and use of specific peptide sequences foruse in the present inventions is disclosed in commonly owned priorityapplication No. 60/995,970 (Attorney docket no. 33547-708.101), filedSep. 28, 2007.

XII. Databases

The invention includes methods of producing computer-readable databasescomprising the sequence and identified mutations of certain proteins,including, but not limited to, sequences of binding domains, or activesites, as well as their binding characteristics, activity, stabilitycharacteristics and three-dimensional molecular structure. Specificallyincluded in the present invention is the use of such a database to aidin the design and optimization of a protein of interest, based on adatabase of mutations created from the protein of interest, or relatedproteins or portions thereof.

In other embodiments, the databases of the present invention cancomprise mutations of a protein or proteins that have been identified byscreening to bind to a specific target, or other representations of suchproteins such as, for example, a graphic representation or a name.

By “database” is meant a collection of retrievable data. The inventionencompasses machine readable media embedded with or containinginformation regarding the amino acid and nucleic structure of a proteinor proteins, such as, for example, its sequence, structure, and theactivity or binding activity, as described herein. Such information canpertain to subunits, domains, and/or portions thereof such as, forexample, portions comprising active sites, accessory binding sites,and/or binding pockets in either liganded (bound) or unliganded(unbound) forms.

Alternatively, the information can be that of identifiers whichrepresent specific structures found in a protein. As used herein,“machine readable medium” refers to any medium that can be read andaccessed directly by a computer or scanner. Such media can take manyforms, including but not limited to, non-volatile, volatile andtransmission media. Non-volatile media, i.e., media that can retaininformation in the absence of power, includes a ROM. Volatile media,i.e., media that cannot retain information in the absence of power,includes a main memory.

Transmission media includes coaxial cables, copper wire and fiberoptics, including the wires that comprise the bus. Transmission mediacan also take the form of carrier waves; i.e., electromagnetic wavesthat can be modulated, as in frequency, amplitude or phase, to transmitinformation signals. Additionally, transmission media can take the formof acoustic or light waves, such as those generated during radio waveand infrared data communications.

Such media also include, but are not limited to: magnetic storage media,such as floppy discs, flexible discs, hard disc storage medium andmagnetic tape; optical storage media such as optical discs or CD-ROM;electrical storage media such as RAM or ROM, PROM (i.e., programmableread only memory), EPROM (i.e., erasable programmable read only memory),including FLASH-EPROM, any other memory chip or cartridge, carrierwaves, or any other medium from which a processor can retrieveinformation, and hybrids of these categories such as magnetic/opticalstorage media. Such media further include paper on which is recorded arepresentation of the amino acid or polynucleotide sequence, that can beread by a scanning device and converted into a format readily accessedby a computer or by any of the software programs described herein by,for example, optical character recognition (OCR) software. Such mediaalso include physical media with patterns of holes, such as, forexample, punch cards and paper tape.

Specifically included in the present invention is the transmission ofdata from the data base via transmission media to third party site toaid in the design and optimization of a protein of interest.

A variety of data storage structures are available for creating acomputer readable medium having recorded thereon the amino acid orpolynucleotide sequences of the invention or portions thereof and/oractivity data. The choice of the data storage structure can be based onthe means chosen to access the stored information. All formatrepresentations of the amino acid or polynucleotide sequences describedherein, or portions thereof, are contemplated by the present invention.By providing computer readable medium having stored thereon thesequences of the invention, one can routinely access the SHM mediatedchanges in amino acid or polynucleotide sequence and related informationfor use in modeling and design programs, to create improved proteins.

A computer can be used to display the sequence of the protein or peptidestructures, or portions thereof, such as, for example, portionscomprising active sites, accessory binding sites, and/or bindingpockets, in either liganded or unliganded form, of the presentinvention. The term “computer” includes, but is not limited to,mainframe computers, personal computers, portable laptop computers, andpersonal data assistants (“PDAs”) which can store data and independentlyrun one or more applications, i.e., programs. The computer can include,for example, a machine readable storage medium of the present invention,a working memory for storing instructions for processing themachine-readable data encoded in the machine readable storage medium, acentral processing unit operably coupled to the working memory and tothe machine readable storage medium for processing the machine readableinformation, and a display operably coupled to the central processingunit for displaying the structure coordinates or the three-dimensionalrepresentation.

The computers of the present invention can also include, for example, acentral processing unit, a working memory which can be, for example,random-access memory (RAM) or “core memory,” mass storage memory (forexample, one or more disk drives or CD-ROM drives), one or morecathode-ray tube (“CRT”) display terminals or one or more LCD displays,one or more keyboards, one or more input lines, and one or more outputlines, all of which are interconnected by a conventional bi-directionalsystem bus. Machine-readable data of the present invention can beinputted and/or outputted through a modem or modems connected by atelephone line or a dedicated data line (either of which can include,for example, wireless modes of communication). The input hardware canalso (or instead) comprise CD-ROM drives or disk drives. Other examplesof input devices are a keyboard, a mouse, a trackball, a finger pad, orcursor direction keys. Output hardware can also be implemented byconventional devices. For example, output hardware can include a CRT, orany other display terminal, a printer, or a disk drive. The CPUcoordinates the use of the various input and output devices, coordinatesdata accesses from mass storage and accesses to and from working memory,and determines the order of data processing steps. The computer can usevarious software programs to process the data of the present invention.Examples of many of these types of software are discussed throughout thepresent application.

EXAMPLES

Elements of the present application are illustrated by the followingexamples, which should not be construed as limiting in any way.

Example 1 Creation of Synthetic Polynucleotides Encoding Blasticidin

By decreasing the likelihood of somatic hypermutation in a vectorelement, such as a selectable marker, an enzyme involved in SHM, or areporter gene, the vector and system for exerting and tracking SHMbecomes more stable, thereby enabling somatic hypermutation to be moreeffectively targeted to a polynucleotide of interest.

A. Polynucleotide Design

In general, sequences are engineered for SHM using the teachingdescribed herein, and as elaborated in sections III and IV. In thefollowing examples, the sequence optimization is based on the hot spotand cold spot motifs listed in Table 7, and using the computer programSHMredesign.pl as described above.

Using this program, every position within the sequence is annotated witheither a ‘+’, ‘−’, or ‘.’ symbol to designate whether it is desired toobtain a hotter, colder, or neutral changes in SHM susceptibility atthat specific position. Where ‘+’ designates a hot spot, ‘−’ cold spot,and ‘.’ a neutral position. For example, the following input sequencefor blasticidin is used to identify SHM resistant versions at everyposition of the blasticidin gene.

(SEQ ID NO: 26) >ATGGCCAAGCCTTTGTCTCAAGAAGAATCCACCCTCATTGAAAGAGCAACGGCTACAATCAACAGCATCCCCATCTCTGAAGACTACAGCGTCGCCAGCGCAGCTCTCTCTAGCGACGGCCGCATCTTCACTGGTGTCAATGTATATCATTTTACTGGGGGACCTTGCGCAGAACTCGTGGTGCTGGGCACTGCTGCTGCTGCGGCAGCTGGCAACCTGACTTGTATCGTCGCGATCGGAAATGAGAACAGGGGCATCTTGAGCCCCTGCGGACGGTGCCGACAGGTTCTTCTCGATCTGCATCCTGGGATCAAAGCCATAGTGAAGGACAGTGATGGACAGCCGACGGCAGTTGGGATTCGTGAATTGCTGCCCTCTGGTTATGTGTG GGAGGGCTAA<---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

By comparison, the following input file, is used to identify hotterversions of the blasticidin gene that are more susceptible to SHM atevery position of the gene.

(SEQ ID NO: 26) >ATGGCCAAGCCTTTGTCTCAAGAAGAATCCACCCTCATTGAAAGAGCAACGGCTACAATCAACAGCATCCCCATCTCTGAAGACTACAGCGTCGCCAGCGCAGCTCTCTCTAGCGACGGCCGCATCTTCACTGGTGTCAATGTATATCATTTTACTGGGGGACCTTGCGCAGAACTCGTGGTGCTGGGCACTGCTGCTGCTGCGGCAGCTGGCAACCTGACTTGTATCGTCGCGATCGGAAATGAGAACAGGGGCATCTTGAGCCCCTGCGGACGGTGCCGACAGGTTCTTCTCGATCTGCATCCTGGGATCAAAGCCATAGTGAAGGACAGTGATGGACAGCCGACGGCAGTTGGGATTCGTGAATTGCTGCCCTCTGGTTATGTGTGGGAGGGC TAA<+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

As described previously, during this process, all nucleotide sequencesover a 9 base region consistent with the wild type protein's amino acidsequence are enumerated and scored for the number of hot spot motifs,cold spot motifs, CpG motifs, codon usage, and stretches of the samenucleotide. The program then determines whether it is possible toreplace any random sequence with either a hotter, colder, or neutralpolynucleotide tile.

As shown in FIG. 7, this approach, as applied to canine AID quickly,(within a few hundred tile substitutions), converges to identify a coldoptimized canine AID new sequence, which differs from the originalsequence through the substitution 15-20% of the nucleotide sequence. Themajority of changes occur early in the iterative cycle and are usuallycomplete after about 500 iterations. As one might expect, larger genesrequire a larger number of iterations to reach a fully optimizedsequence. Routinely the use of 2000 to 3000 iterations is more thansufficient for the majority of genes.

Analysis of a number of unmodified genes at random demonstrates thatmost mammalian genes use codons that create on average about 9 to 15cold spots per 100 nucleotides, and with a median density of about 13.8cold spots/100 nucleotides, and have a hot spot density of between about7 to 13 hot spots per 100 nucleotides, with a median density of about8.3 hot spots per 100 nucleotides.

The initial starting sequence, as well as the frequency of hot spots,cold spots and CpGs for the unmodified, blasticidin gene are shown inFIG. 8.

1. Cold Blasticidin

An optimized sequence for a SHM resistant (cold) version of blasticidincreated using this approach is shown in FIG. 9, together with theresulting changes in frequency of hot spots and cold spots. Optimizationof the blasticidin sequence to make the sequence more resistant tosomatic hypermutation resulted in an increase of 188% in number of coldspots (an increase of 73), and reduced the number of hot spots by 57% (adecrease of 15). Overall the frequency of cold spots increased to anaverage density of about 28 cold spots per 100 nucleotides from aninitial density of about 15 cold spots per 100 nucleotides, and theoverall frequency of hot spots decreased from about 9 hot spots per 100nucleotides, in the unmodified gene to about 5 hot spots per 100nucleotides in the SHM resistant form.

2. Hot Blasticidin

An optimized sequence for a SHM susceptible version of blasticidincreated using this approach is shown in FIG. 10, together with theresulting changes in frequency of hot spots and cold spots. Optimizationof the blasticidin sequence to make the sequence more susceptible tosomatic hypermutation resulted in an increase of about 197% in number ofhot spots (an increase of 34), and reduced the number of cold spots byabout 56% (a decrease of 26). Overall the frequency of hot spotsincreased to an average density of about 17 hot spots per 100nucleotides from an initial density of about 9 hot spots per 100nucleotides, and the overall frequency of cold spots decreased fromabout 15 cold spots per 100 nucleotides in the unmodified gene to about9 cold spots per 100 nucleotides in the SHM susceptible form.

B. Cloning and Analysis

After final review to ensure that the synthetic polynucleotide sequenceis free of extraneous restriction sites, the complete polynucleotidesequence is synthesized (DNA 2.0, Menlo Park, Calif.), cloned into oneof DNA2.0's cloning vectors (see Table 11 below), and sequenced toconfirm correct synthesis.

TABLE 11 DNA2.0 source restriction sites Vector that insert Constructplasmid (5′, 3′) is cloned into cold TFP pJ15 Sac1, BsrG1 AB136 hot TFPpJ15 Sac1, BsrG1 AB102 GFP* stop pJ31 Sac1, BsrG1 AB105 (Y82stop) coldhygromycin pJ2 NgoMIV, Xba1 AB179, AB163 unmodified pJ51 NgoMIV, Xba1AB150, AB161 puromycin cold blasticidin pJ13 NgoMIV, Xba1 AB102, AB153cold AID pJ45 Sac1, BsrG1 AB135, AB174 Human AID Kappa enhancers PCRamplification of genomic DNA

Other elements, for example E-box motifs or Ig enhancer elements arecreated by either oligo synthesis or PCR amplification as described inExample 5 below.

To test functionality of the new synthetic inserts, coding regions areexcised from DNA2.0 source vectors using restriction enzymes as listedin Table 11 above, and inserted into expression vectors (Table 11) usingstandard recombinant molecular biological techniques. Insertion ofselection markers (i.e., cold blasticidin, cold hygromycin, andunmodified puromycin) into the AB series of vectors places them downstream of the EMCV IRES sequence (AB150, AB102, AB179; see FIG. 33A) ordownstream of the pSV promoter (AB161, AB153, AB163; see FIG. 33B).

To test functional activity of the optimized synthetic genes, Hek 293cells are plated at 4×10⁵/well, in 6-well microliter dish. After 24hours, transfections are performed using Fugene6 reagent from RocheApplied Sciences (Indianapolis, Ind.) at a reagent-to-DNA ratio of 3μL:1 μg DNA per well. This ratio is also maintained for transfectionswith multiple plasmids. Transfections are carried out in accordance withmanufacturer's protocol.

To determine the relative stability/susceptibility of each construct tosomatic hypermutation, stable cell lines of each transfected cellpopulation are created, and tested to determine the relative speed bywhich they accumulate SHM mediated mutations. Because the majority ofthese mutations result in a loss of function, relative mutagenesis loadare conveniently measured as a loss of fluorescence via FACS (see belowand Example 4).

FACS Analysis. Prior to FACS analysis, cells are harvested bytrypsinization, washed twice in PBS containing 1% w/v BSA, andre-suspended in 200 μl PBS/1% BSA containing 2 ng/ml DAPI. Cells areanalyzed in the Cytopeia Influx with 200 mW 488 nm and 50 mW 403 nmlaser excitation. Up to one million cells per sample are acquired. DAPIfluorescence is measured through a 460/50 bandpass filter. GFPfluorescence is measured through a 528/38 bandpass filter. Percent GFPexpression is reported as percentage of DAPI excluding live cells withno detectable GFP fluorescence above cellular background.

Reversion assays to test for function of the canine AID gene. GFP* (GFPwith a stop codon introduced by site directed mutagenesis at position 82[Y82stop]) is co-transfected with AB174 (cold canine AID), and cells areanalyzed by flow cytometry 3 days post transfection, placed underantibiotic selection and analyzed further by flow cytometry every otherday for 13-15 days.

Antibiotic selections. Antibiotic concentrations used in the selectionof Hek 293 cells are determined empirically by performing a kill curve(i.e., determining the minimal concentration of antibiotic that killsall un-transfected—and thus antibiotic sensitive—cells). At 3 days posttransfection, cells are plated at 4×10⁵/well and selected at thefollowing concentrations: 1.5 μg/ml puromycin (Clontech, Mountain View,Calif.); 16 μg/mL blasticidin (Invitrogen, Carlsbad, Calif.); and 360μg/mL hygromycin (Invitrogen, Carlsbad, Calif.).

Resistance marker genes are tested to determine functionality bytransfection of the appropriate expression plasmid (i.e. AB102 forblasticidin, AB179 for hygromycin) in Hek 293 cells based on theirability to promote drug resistance cell growth in the presence of 16μg/mL blasticidin (Invitrogen, Carlsbad, Calif.); and 360 μg/mLhygromycin (Invitrogen, Carlsbad, Calif.) for two weeks.

Transfection of the AB102 containing cold blasticidin resulted in thecreation of drug resistant colonies of transfected hek 293 cells atcomparable rates as the wild type gene.

Example 2 Creation of Synthetic Polynucleotides Encoding Hygromycin

A. Polynucleotide Design

The starting sequence for unmodified hygromycin is shown in FIG. 11,together with the initial analysis of hot spot and cold spot frequency.

As described for Example 1, sequence optimization is completed using thecomputer program SHMredesign, based on the hot spot and cold spot motifslisted in Table 7.

1. Cold Hygromycin

From iteration 1 to iteration 2000, an additional 71 cold spots areinserted into the gene, 12 existing hot spots are removed, and 61 CpGsites are removed making the gene sequence less susceptible to somatichypermutation. No further beneficial changes are observed upon furtheriterations.

An optimized sequence for a SHM resistant version of hygromycin createdusing this approach is shown in FIG. 12, together with the resultingchanges in frequency of hot spots and cold spots. Optimization of thehygromycin sequence to make the sequence more resistant to somatichypermutation resulted in an increase of 144 in number of cold spots (anincrease of 71), and reduced the number of hot spots by about 17% (adecrease of 12). Overall the frequency of cold spots increased to anaverage density of about 22 cold spots per 100 nucleotides from aninitial density of about 15 cold spots per 100 nucleotides, and theoverall frequency of hot spots decreased from about 7 hot spots per 100nucleotides, in the unmodified gene to about 5 hot spots per 100nucleotides in the SHM resistant form.

Transfection of the AB179 containing cold hygromycin resulted in thecreation of drug resistant colonies of transfected Hek 293 cells atcomparable rates as the wild type gene.

2. Hot Hygromycin

Conversely, by increasing the probability of somatic hypermutation in avector element such as a selectable marker, one is able to “reclaim” amarker for future use.

In the case of the hot hygromycin construct, the designed nucleotidesequence is maximized for hot spots, minimized for cold spots, minimizedfor CpG repeats and is rendered consistent with known mammalianoptimized codon usage.

Optimization of the hygromycin sequence to make the sequence moresusceptible to somatic hypermutation resulted in an increase of about183% in number of hot spots (an increase of 61), and reduced the numberof cold spots by about 42% (a decrease of 35) (FIG. 13). Overall thefrequency of hot spots increased to an average density of about 13 hotspots per 100 nucleotides from an initial density of about 7 hot spotsper 100 nucleotides, and the overall frequency of cold spots decreasedfrom about 15 cold spots per 100 nucleotides in the unmodified gene toabout 12 cold spots per 100 nucleotides in the SHM susceptible form.

B. Cloning and Analysis

After final review to ensure that a synthetic polynucleotide sequence isfree of extraneous restriction sites, the complete polynucleotidesequence is synthesized (DNA 2.0, Menlo Park, Calif.), cloned into oneof DNA2.0's cloning vectors (see Table 11; Example 1), sequenced toconfirm correct synthesis and tested for activity as described above forExample 1.

To determine the relative stability/susceptibility of each of theconstructs, to somatic hypermutation, stable cell lines of eachtransfected cell population re created, and tested to determine therelative speed by which they accumulate SHM mediated mutations. Becausethe majority of these mutations result in a loss of function, relativemutagenesis load are conveniently measured as a loss of fluorescence viaFACS (see Example 4).

Example 3 Creation of Synthetic Polynucleotides Encoding Reporter Genes

A. Polynucleotide Design

The starting sequence for unmodified Teal Fluorescent Protein (TFP) isshown in FIG. 14, together with the initial analysis of hot spot andcold spot frequency.

1. Hot TFP

As described for Example 1, sequence optimization is completed using thecomputer program SHMredesign, based on the hot spot and cold spot motifslisted in Table 7; the resulting hot and cold versions of TFP are shownin FIGS. 15 and 16, respectively.

Optimization of the TFP sequence to make the sequence more susceptibleto somatic hypermutation resulted in an increase of about 170% in numberof hot spots (an increase of 28), and reduced the number of cold spotsby about 26% (a decrease of 27). Overall the frequency of hot spotsincreased to an average density of about 10 hot spots per 100nucleotides from an initial density of about 6 hot spots per 100nucleotides, and the overall frequency of cold spots decreased fromabout 15 cold spots per 100 nucleotides in the unmodified gene to about11 cold spots per 100 nucleotides in the SHM susceptible form.

2. Cold TFP

Optimization of the TFP sequence to make the sequence more resistant tosomatic hypermutation resulted in an increase of 120% in number of coldspots (an increase of 21), and reduced the number of hot spots by about10% (a decrease of 4). Overall the frequency of cold spots increased toan average density of about 18 cold spots per 100 nucleotides from aninitial density of about 15 cold spots per 100 nucleotides, and theoverall frequency of hot spots decreased from about 6 hot spots per 100nucleotides, in the unmodified gene to about 5 hot spots per 100nucleotides in the SHM resistant form.

B. Cloning and Analysis

After final review to ensure that the synthetic polynucleotide sequenceis free of extraneous restriction sites, the complete polynucleotidesequence is synthesized (DNA 2.0, Menlo Park, Calif.), cloned into oneof DNA2.0's cloning vectors (see Table 11; Example 1), sequenced toconfirm correct synthesis and tested for activity as described below.

Hek 293 cells are transfected with the expression vectors (AB102 and 136as described above in Example 1) containing either hot or cold versionsof TFP driven for expression by an identical CMV promoter. Selection forstable expression began 3 days post transfection. Prior to FACSanalysis, cells are harvested by trypsinization, ished twice in PBScontaining 1% w/v BSA, and re-suspended in 200 μl PBS/1% BSA containing2 ng/ml DAPI. Cells are analyzed in the Cytopeia Influx with 200 mW 488nm and 50 mW 403 nm laser excitation. Up to one million cells per sampleare acquired. DAPI fluorescence is measured through a 460/50 bandpassfilter. GFP fluorescence is measured through a 528/38 bandpass filter.Percent GFP expression is reported in Table 12A as percentage of DAPIexcluding live cells with no detectable GFP fluorescence above cellularbackground.

TABLE 12A Expression analysis of “hot” and “cold” versions of TFP % TFPFold Expressing TFP Control over Construct cells FluorescenceFluorescence control Hot TFP 63.74 189.33 20.61 9 (SHM susceptible) ColdTFP 66.92 429.72 19.93 22 (SHM resistant) Hot TFP 48.39 183.21 20.09 9(SHM susceptible) Cold TFP 51.20 656.06 20.26 32 (SHM resistant)

These results show good expression above background of both hot and coldversions of TFP. In this case, making the sequence “cold” produced thesurprising result that relative expression of the protein is improved.Such improved expression provides an additional benefit to the SHMresistant synthetic genes.

To determine the relative stability/susceptibility of each construct tosomatic hypermutation, stable cell lines of each transfected cellpopulation are created, and tested to determine the relative speed bywhich they accumulate SHM mediated mutations. Because the majority ofthese mutations result in a loss of function, relative mutagenesis loadare conveniently measured as a loss of fluorescence via FACS (seeExample 4).

Episomal expression constructs carrying either a SHM optimized codingsequence for hot TFP or cold TFP were individually stably co-transfectedwith AID into HEK 293 cells and allowed to expand and grow for 3 weeks(the cold canine AID used in these experiments contains theNES-inactivating L198A mutation; SEQ ID NO: 22). Cell stocks were thenfrozen, and one vial each of hot TFP and cold TPF were thawed, grown inculture for 4 days, and then pulsed with supplemental AID by transientlytransfecting the 4 day post-thaw culturing with an additional aliquot ofthe original AID expression construct (termed “AID pulsing”). Cells wereharvested by trypsinization nine days following the AID pulse, pelletedat 1150×g for 5 min., and frozen for later use.

Cell pellets were subsequently thawed and TFP ORFs were recovered by PCRusing oligonucleotide (oligo) primers GTGGGAGGTCTATATAAGCAGAGC (SEQ IDNO: 339) and GATCGTCTCACGCGGATTGTAC (SEQ ID NO: 377). The former oligoamplifies from near the 3′ end of the CMV promoter used for drivingexpression of TFP mRNA, which lies 142 nt 5′ to the TFP start codon, andthe latter oligo matches sequences ending 1 nt 3′ to the TFP stop codon.

Each PCR reaction (total volume of 50 μL) was run 35 cycles under thefollowing conditions: 95° C. for 5 min, 35 cycles of (95° C. for 30 sec,55° C. for 30 sec, 68° C. for 45 sec), followed by 1 min at 68° C.before cooling to 4° C. PCR amplified products were cloned into theTOPO® TA cloning vector (Invitrogen, Carlsbad, Calif.), and inserts weresequenced. A total of 166 hot and 111 cold TFP ORFs were rescued,sequenced and compared the resulting spectrum of mutations. Globalstatistics for the mutations observed in the two sets of sequences areshown in Table 12B.

TABLE 12B Mutation metrics for cold- and hot-TFP # ORFs # total # nt kbper templates per template sequenced mutations sequenced mutationmutation coldTFP 111 18 61050 3391 6.1 hotTFP 166 100 88500 885 1.6

The mutation frequency is approximately 3.8-fold greater in the TFPtemplate version with maximized hotspots vs. the cold TFP sequence withminimized hotspots. The data demonstrates that SHM optimization ofpolynucleotide sequences can be used to either increase or decrease thefrequency of mutations experienced by a polynucleotide encoding aprotein of interest.

FIG. 16D shows the mutations for a representative segment of the hot andcold TFP constructs. The central row shows the amino acid sequence ofTFP (residues 59 thru 87) in single letter format, and the “hot” and“cold” starting nucleic acid sequences encoding the two constructs areshown above (hot) and below (cold) the amino acid sequence. Mutationsobserved in the hot sequence are aligned and stacked top of the genesequences, while mutations in the cold TFP sequence are shown below. Theresults illustrate how “silent” changes to the coding sequences generatedramatic changes in observed AID-mediated SHM rates, demonstrating thatengineered sequences can be effectively optimized to create fast or slowrates of SHM.

FIG. 16E shows that the spectrum of mutations generated by AID in thepresent in vitro tissue culture system mirror those observed in otherstudies and those seen during in vivo affinity maturation. FIG. 16Eshows the mutations generated in the present study (Box (i) upper left,n=118), and compares them with mutations observed by Zan et al. (box(ii) upper right, n=702), Wilson et al. (lower left, n=25000; box(iii)), and a larger analysis of IGHV chains that have undergoneaffinity maturation (lower right, n=101,926; box (iv)). The Y-axis ineach chart indicates the starting nucleotide, the X-axis indicates theend nucleotide, and the number in each square indicates the percentage(%) of time that nucleotide transition is observed. In the presentstudy, the frequency of mutation transitions and transversions wassimilar to those seen in other data sets. Mutations of C to T and G to Aare the direct result of AID activity on cytidines and account for 48%of all mutation events. In addition, mutations at bases A and T accountfor ˜30% of mutation events (i.e., slightly less than frequenciesobserved in other datasets).

FIG. 16F shows that mutation events are distributed throughout the SHMoptimized nucleotide sequence of the hot TFP gene, with a maximuminstantaneous rate of about 0.08 events per 1000 nucleotides pergeneration centered around 300 nucleotides from the beginning of theopen reading frame. Stable transfection and selection of a gene with AID(for 30 days) produces a maximum rate of mutation of 1 event per 480nucleotides. As a result, genes may contain zero, one, two or moremutations per gene. The distribution of SHM-mediated events observed inhot TFP sequenced genes can be seen in FIG. 16G, compared to thesignificantly reduced pattern of mutations seen in cold TFP (FIG. 16H).

Thus the present study demonstrates that the creation of non-synonymousversions of genes such as Teal-fluorescent protein (TFP) that do notnormally undergo somatic hypermutation can be used to target such genesfor high rates of somatic hypermutation. Additionally, the creation ofSHM resistant genes (while encoding for the same amino acids) can leadto proteins that have a reduced number of somatic hypermutationhot-spots and, thus, experience a dramatically reduced level of AIDmediated hypermutation. In each instance of SHM optimization, mammaliancodon usage and other factors effecting gene expression levels wereconsidered in generating the engineered sequences, leading to proteinsthat also exhibit reasonable levels of translation and expression. Theresults, therefore, demonstrate that the present methods of SHMoptimization (i) can be successfully used to target the activity of AIDto specific regions of an expressed gene; (ii) can be used to speed orslow the rate of SHM, (iii) demonstrate that the spectrum of mutationsgenerated by AID using this methodology is equivalent to that observedin vivo; (iv) and demonstrate that SHM optimization can be successfullyperformed on a gene of interest to either positively or negativelyimpact its rate of AID-mediated SHM without significantly negativelyimpacting its expression.

Example 4 Creation of Synthetic Polynucleotides Encoding EnzymesInvolved in SHM

The systems and methods described herein can be applied to enzymes suchas, for example, AID, Pol eta, Pol theta and UDG.

A. Activation Induced Cytidine Deaminase (AID)

Analysis of sequence variations in cytidine deaminase (AID) betweenmammalian species (e.g., rat, chimpanzee, mouse, human, dog, cow,rabbit, chicken, frog, zebra fish, fugu and tetraodon (puffer fish)) ascompared to humans demonstrates that organisms as distantly related ashuman and frog display a surprisingly high (70%) sequence identity,and >80% sequence similarity. In addition, it has been shown that AIDfrom other organisms can be substituted for human AID in somatichypermutation (SHM), and that all mammalian species of AID arefunctionally equivalent.

Shown in FIG. 17 is a comparison of human AID with other terrestrialAIDs in order to identify a promising beginning construct for SHM invivo. The figure provides a sequence alignment of AID from human(H_sap/1-198), mouse (M_musc/1-198), canine (C_fam/1-198), rat(R_norv/1-199), and chimpanzee (P_trog/1-199). FIG. 21 illustrates thesequence identity between human, canine and mouse AID proteins.

Canine AID has overall 94% amino acid identity to human and mouse AIDand, thus, is selected as the starting point for codon optimization. Tooptimize codon usage, the canine amino acid sequences are reversetranslated and then iteratively optimized.

AID is known to contain a nuclear export signal, which is containedwithin the C-terminal 10 amino acids (McBride et al., J Exp Med. 2004Can 3; 199(9):1235-44; Ito et al., PNAS 2004 Feb. 17; 101(7):1975-80).For purposes of the experiments described below, the canine AID containsa leucine to alanine mutation at position 198, while the human AIDconstruct retains the unmutated, intact nuclear export signal.

As described in Example 1, SHM sequence optimization is completed usingthe computer program SHMredesign, based on the hot spot and cold spotmotifs listed in Table 7; the resulting hot and cold versions of canineAID are shown in FIGS. 19 and 20, respectively. The starting sequencefor canine AID is shown in FIG. 18, together with the initial analysisof hot spot and cold spot frequency.

1. Hot AID

Optimization of the AID sequence to make the sequence more susceptibleto somatic hypermutation resulted in an increase of about 200% in numberof hot spots (an increase of 43), and reduced the number of cold spotsby about 30% (a decrease of 23). Overall the frequency of hot spotsincreased to an average density of about 14 hot spots per 100nucleotides from an initial density of about 7 hot spots per 100nucleotides, and the overall frequency of cold spots decreased fromabout 13 cold spots per 100 nucleotides in the unmodified gene to about9 cold spots per 100 nucleotides in the SHM susceptible form.

2. Cold AID

Optimization of the canine AID sequence to make the sequence moreresistant to somatic hypermutation resulted in an increase of 186% innumber of cold spots (an increase of 68), and reduced the number of hotspots by about 35% (a decrease of 14). Overall the frequency of coldspots increased to an average density of about 25 cold spots per 100nucleotides from an initial density of about 13 cold spots per 100nucleotides, and the overall frequency of hot spots decreased from about7 hot spots per 100 nucleotides, in the unmodified gene to about 5 hotspots per 100 nucleotides in the SHM resistant form.

After final review to ensure that the synthetic polynucleotide sequenceis free of extraneous restriction sites, the complete polynucleotidesequence is synthesized (DNA 2.0, Menlo Park, Calif.), cloned into oneof DNA2.0's cloning vectors (see Table 11; Example 1), sequenced toconfirm correct synthesis and tested for activity as described below andin Example 1.

To determine canine AID activity, the cold or wild type versions ofcanine AID are co-transfected with expression vectors expressing theGFP* construct that contains a stop codon within it's coding region (asdescribed in Example 1) either in the presence or absence of Ig enhancerelements within the target vector sequence. Mutation of the stop codonby AID results in the creation of a functional fluorescent protein thatis a direct indicator of AID activity.

In this experiment, cells are harvested by trypsinization, washed twicein PBS containing 1% w/v BSA, and re-suspended in 200 μl PBS/1% BSAcontaining 2 ng/ml DAPI. Cells are analyzed in the Cytopeia Influx with200 mW 488 nm and 50 mW 403 nm laser excitation. Up to one million cellsper sample are acquired and revertants are determined as percentage ofDAPI excluding live cells with detectable GFP fluorescence abovecellular background.

FIG. 22A shows the predicted effect of AID activity on protein function,in this type of assay. Of note is the observation that mutagenesis canproduce mutations that both initially restore or improve function andlater reduce or eliminate function. The balance in these two ratesgenerates early and rare mutation events that restore function, followedby secondary and ternary mutation events that destroy function in theseproteins. The net effect of these competing rates on the observation ofgain-of-function events in a population can be seen in FIG. 22B. Giventhree different assumptions regarding number of inactivating mutationsneeded to silence GFP, one would expect to observe three very differentprofiles of reversion events as a function of time, dependent on therate of enzymatic activity of the AID.

Thus although initial reversion rates can provide an accurate assessmentof AID activity, long term studies of activity require an analysis ofthe rate of extinction of activity, rather than reversion offluorescence.

To test this possibility, a cell line that is stably expressing afluorescent protein is transfected with 2 concentrations of expressionvector containing cold canine AID. Cells are stably maintained inculture and sample assayed for total fluorescence after the indicatedperiods of time.

Prior to FACS analysis, cells are harvested by trypsinization, washedtwice in PBS containing 1% w/v BSA, and re-suspended in 200 μl PBS/1%BSA containing 2 ng/ml DAPI. Cells are analyzed in the Cytopeia Influxwith 200 mW 488 nm and 50 mW 403 nm laser excitation. DAPI fluorescenceis measured through a 460/50 bandpass filter. GFP fluorescence ismeasured through a 528/38 bandpass filter. Percent GFP expression isreported as percentage of DAPI excluding live cells with no detectableGFP fluorescence above cellular background.

The results, shown in FIG. 22C, show a steady and sustained progressive,dose dependent decrease in GFP expression (shown as increasing GFPextinction) with time when co-expressed with increasing amounts of coldAID. The data are consistent with the hypothesis that cold AID is ableto introduce multiple mutations into a target gene, and is bothfunctional and stable when expressed in a “cold form” for many days.

To directly compare the ability of cold canine AID to exert mutagenesis,initial reversion assays are set up comparing cold canine AID with wildtype human AID. Hek 293 cells are transfected with the expressionvectors (as described above in Example 1) containing either the GFP* asdescribed above, or GFP* with the Kappa E3 and intronic enhancesinserted 5′ to the CMV promoter, together with either human or coldcanine AID. Selection for stable expression began 3 days posttransfection. Prior to FACS analysis, cells are harvested bytrypsinization, washed twice in PBS containing 1% w/v BSA, andre-suspended in 200 μl PBS/1% BSA containing 2 ng/ml DAPI. Cells areanalyzed in the Cytopeia Influx with 200 mW 488 nm and 50 mW 403 nmlaser excitation. Up to one million cells per sample are acquired. DAPIfluorescence is measured through a 460/50 bandpass filter. GFPfluorescence is measured through a 528/38 bandpass filter. Percent GFPexpression is reported as percentage of DAPI excluding live cells withno detectable GFP fluorescence above cellular background.

The results show (FIG. 22C) that canine AID exhibited significantlyenhanced reversion activity compared to human AID. Also in thisexperiment is shown the effect of the kappa 3′E and intronic enhancerson the rate of reversion experienced by the target gene when these areincluded in the expression vector. As shown inclusion of the enhancerelements further enhanced reversion frequency.

B. Pol Eta

The starting unmodified sequence for Pol eta is shown in FIG. 23,together with the initial analysis of hot spot and cold spot frequency.

As described above, sequence optimization is completed using thecomputer program SHMredesign, based on the hot spot and cold spot motifslisted in Table 7; the resulting cold and hot versions of Pol eta areshown in FIGS. 24 and 25, respectively. Changes in hot and cold spotdensity are summarized below in Table 13:

TABLE 13 Summary of hot spot and cold spot changes No. Hot No. Cold HotSpot Cold Spot Gene Spots Spots Density Density Native 177 307 8 9 Cold98 606 5 28 Hot 359 185 17 8

In Table 13, “Hot Spot Density” or “Cold Spot Density,” refers to thetotal number of hot or cold spot motifs in the ORF, divided by thenumber of nucleotides in the ORF, multiplied by 100, rounded to thenearest whole number.

C. Pol Theta

The starting amino acid and nucleic acid sequences for Pol theta areshown in FIG. 26 and in FIG. 27, respectively, together with the initialanalysis of hot spot and cold spot frequency.

As described above, sequence optimization is completed using thecomputer program SHMredesign, based on the hot spot and cold spot motifslisted in Table 7; the resulting cold and hot versions of Pol theta areshown in FIG. 28 and in FIG. 29, respectively. Changes in hot and coldspot density (per 100 nucleotides) are summarized below in Table 14:

TABLE 14 Summary of hot spot and cold spot changes Hot Cold Hot SpotCold Spot Gene Spots # Spots # Density Density Native 597 1054 8 14 Cold349 1847 4 24 Hot 1053 700 13 9

In Table 14, “Hot Spot Density” or “Cold Spot Density” refers to thetotal number of hot or cold spots in the ORF, divided by the number ofnucleotides in the ORF, multiplied by 100, rounded to the nearest wholenumber.

D. UDG

The starting amino acid and nucleic acid sequences for UDG are shown inFIGS. 30A and 30B, respectively, together with the initial analysis ofhot spot and cold spot frequency (FIG. 30C).

As described above, sequence optimization is completed using thecomputer program SHMredesign, based on the hot spot and cold spot motifslisted in Table 7; the resulting cold and hot versions of UDG are shownin FIGS. 31A and 31C, respectively, together with the initial analysisof hot spot and cold spot frequency (FIGS. 31B and 31D), respectively.Changes in hot and cold spot density (per 100 nucleotides) aresummarized below in Table 15:

TABLE 15 Summary of hot spot and cold spot changes Hot Cold Hot SpotCold Spot Gene Spots # Spots # Density Density Native 70 140 8 15 Cold44 269 5 29 Hot 145 115 16 13

In Table 15, “Hot Spot Density” or “Cold Spot Density,” refers to thetotal number of hot or cold spots in the ORF, divided by the number ofnucleotides in the ORF, multiplied by 100, rounded to the nearest wholenumber.

Example 5 Creation of Vectors for Somatic Hypermutation

Vectors are constructed from sub-fragments that are each synthesized byDNA2.0 (Menlo Park, Calif.). Vectors are able to simultaneously expressmultiple open reading frames and are capable of stable, episomalreplication in mammalian cells that are naturally permissive or renderedto be permissive (i.e., via co-expression of human EBP2 (Habel et al.,2004; Kapoor et al., 2001) for replication of Epstein Barr Virus (EBV)origin of replication (oriP) containing vectors.

Plasmids are rendered highly modular through the strategic placement ofone or more restriction endonuclease recognition sequences (restrictionsites) between discreet fragments throughout the vector.

A. Vectors Formats.

In the first format (FIG. 32A); vectors contained an internal ribosomeentry site (IRES) from the encephalomyocarditis virus (EMCV). Elementscontained within the vectors are operably linked together as shown inFIG. 32A and, in some cases, included the following functional elements(numbers refer to corresponding sequence information found further belowin this section): 1) CMV promoter; 2) Multicloning sites; 3) Gene ofinterest; 4) IRES; 5) Eukaryotic selectable marker such as blasticidin Sdeaminase (bsd), hygromycin phosphotransferase (hyg) orpuromycin-N-acetyl-transferase; 6) Terminator sequences, (3′untranslated region, small intron and polyA signals from SV40 (“IVSpA”)); 7) Epstein Barr Virus (EBV) origin of replication (oriP)(preceded by optional intergenic spacer region); 8) Prokaryotic originof replication ColE1; 9) Prokaryotic selectable marker such as betalactamase (bla) gene or kanamycin (kan); 10) gene fragment for copynumber determination (such as beta actin or glucose-6-phosphatedehydrogenase (G6PDH), and Ig enhancers.

In a second format, (FIG. 32B), the expression vectors are made withoutan IRES, but contained instead an independent expression cassette forexpressing a selectable marker gene. This expression cassette included,11) the SV40 immediate early promoter (pSV) and eukaryotic selectablemarker, and IVS pA as described above. Elements contained within thevectors are operably linked together as shown in FIG. 32 and, in somecases, included the following functional elements: CMV promoter,multicloning sites, gene of interest, IVS pA, Epstein Barr Virus, (EBV)origin of replication (oriP), pSV, selectable marker, IVS pA,prokaryotic origin of replication ColE1, prokaryotic selectable markersuch as beta lactamase (bla) gene, or kanamycin (kan), gene fragment forcopy number determination, Ig enhancers, and multicloning sites.

In a third format, (FIG. 33A) vectors contained a bidirectional promoterthat drives expression of 2 different genes oriented in oppositedirections. This vector also contained IRES sequences to generate 1 or 2bi- or tri-cistronic messages Elements contained within the vectors areoperably linked together as shown in FIG. 33 using the same functionalelements as described previously.

In a fourth format, (FIG. 33B) vectors contain a bidirectional promoter,one or more IRES sequences that express bi- or tri-cistronic messages,and an independent, cis-linked cassette from which a eukaryoticselectable marker is expressed.

Any of the vectors can be interchanged with each other to form hybrids.In addition, any of the strong constitutive eukaryotic promoterscontained on the episomal vector can be substituted with an induciblepromoter (i.e., the reverse tetracycline transactivator promoter system[prtTA]) to achieve conditional expression of a desired gene. In thiscase, one of the other genes of interest should encode thetransactivating protein, which can be expressed in cis on the sameepisome (as shown in FIG. 34), or supplied in trans on a second,transfected episomal vector.

The orientations for the prokaryotic selectable marker and colEI originof replication provided in sections 8 and 9 below (SEQ ID NOS: 9-11),and in FIGS. 33-35 are not absolute and can be reversed with respect tothe remainder of the vector. Similarly, the orientation of theindependent expression cassette (pSV—selectable marker (or other gene ofinterest)—IVS pA) can also be reversed with respect to the remainder ofthe vector (i.e. transcribing toward the oriP instead of the currentportrayal of transcription away from the oriP). Additionally Enhancerelements, such as Ig enhancers can be placed either 5′ or 3′ to the geneof interest, or can excluded.

B. Representative Sequences of Functional Elements

1. A strong transcriptional promoter that works in eukaryotic cells. InFIGS. 32-33, the CMV promoter is used and the sequence is provided asSEQ ID NO: 1 (the TATA box sequence is shown underlined). The CMVpromoter is altered to remove SacI and BsrGI sites.

(SEQ ID NO: 1) AGCTTGGCCCATTGCATACGTTGTATCCATATCATAATATCTACATTTATATTGGCTCATGTCCAACATTACCGCCATGTTGACATTGATTATTGACTAGTTATTAATAGTAATCAATTACGGGGTCATTAGTTCATAGCCCATATATGGAGTTCCGCGTTACATAACTTACGGTAAATGGCCCGCCTGGCTGACCGCCCAACGACCCCCGCCCATTGACGTCAATAATGACGTATGTTCCCATAGTAACGCCAATAGGGACTTTCCATTGACGTCAATGGGTGGAGTATTTACGGTAAACTGCCCACTTGGCAGTACATCAAGTGTATCATATGCCAAGTACGCCCCCTATTGACGTCAATGACGGTAAATGGCCCGCCTGGCATTATGCCCAGTACATGACCTTATGGGACTTTCCTACTTGGCAGTACATCTACGTATTAGTCATCGCTATTACCATGGTGATGCGGTTTTGGCAGTACATCAATGGGCGTGGATAGCGGTTTGACTCACGGGGATTTCCAAGTCTCCACCCCATTGACGTCAATGGGAGTTTGTTTTGGCACCAAAATCAACGGGACTTTCCAAAATGTCGTAACAACTCCGCCCCATTGACGCAAATGGGCGGTAGGCGTGTACGGTGGGAGGTCTATATAAGCAGAGCTGGTTTAGTGAACCGTCAGATC GCCTA.

2. A region encoding multiple restriction sites termed a multicloningsite (mcs) region:

(SEQ ID NO: 2) TTCCCTGCAGGATTGTTTAAACACCAGATCTGCTTGAATCCGCGGATAAGAGGACTAGTATTCGTCTCACTAGGGAGAGCTCCTA.

3. A gene of interest can be, for example, specific binding member,antibody or fragment thereof, antibody heavy or light chain, enzyme,receptor, peptide growth hormone or transcription factor.

4. An internal ribosome entry site (IRES), in FIG. 32-34 from theencephalomyocarditis virus (EMCV)—permits the concomitant bicistronicexpression of two open reading frames (ORF's): one 5′ to itself, and asecond 3′ to itself. A region containing 2 restriction sites (BsrGI andAscI) is shown 5′ to the IRES (lower case letters). The 3′ end of theIRES includes an NgoMIV site.

(SEQ ID NO: 3) tgtacaatccgcgtgagacgatcggcgcgccCGCCCCTCTCCCTCCCCCCCCCCTAACGTTACTGGCCGAAGCCGCTTGGAATAAGGCCGGTGTGCGTTTGTCTATATGTTATTTTCCACCATATTGCCGTCTTTTGGCAATGTGAGGGCCCGGAAACCTGGCCCTGTCTTCTTGACGAGCATTCCTAGGGGTCTTTCCCCTCTCGCCAAAGGAATGCAAGGTCTGTTGAATGTCGTGAAGGAAGCAGTTCCTCTGGAAGCTTCTTGAAGACAAACAACGTCTGTAGCGACCCTTTGCAGGCAGCGGAACCCCCCACCTGGCGACAGGTGCCTCTGCGGCCAAAAGCCACGTGTATAAGATACACCTGCAAAGGCGGCACAACCCCAGTGCCACGTTGTGAGTTGGATAGTTGTGGAAAGAGTCAAATGGCTCTCCTCAAGCGTATTCAACAAGGGGCTGAAGGATGCCCAGAAGGTACCCCATTGTATGGGATCTGATCTGGGGCCTCGGTGCACATGCTTTACATGTGTTTAGTCGAGGTTAAAAAAACGTCTAGGCCCCCCGAACCACGGGGACGTGGTTTTCCTTTGAAAAACA CGATGATAATATGGCCGGC.

5. The open reading frame (ORF) for a mammalian selectable marker gene,such as, for example, blasticidin S deaminase (bsd) (SEQ ID NO: 4),hygromycin phosphotransferase (hyg) (SEQ ID NO: 5), orpuromycin-N-acetyl-transferase (SEQ ID NO: 6). Start and stop codons areunderlined. 3′ to each ORF is an XbaI site (TCTAGA) used in the cloningstep.

Blasticidin S deaminase (bsd; cold spot optimized)

(SEQ ID NO: 4) ATGGCCAAGCCTTTGTCTCAAGAAGAATCCACCCTCATTGAAAGGGCCACTGCTACAATCAACAGCATCCCCATCTCTGAAGACTACTCTGTCGCCAGCGCAGCTCTCTCCTCTGACGGGAGAATCTTCACTGGTGTCAATGTATATCATTTTACTGGGGGACCTTGCGCAGAGCTTGTGGTCCTGGGGACTGCTGCTGCTGCTGCAGCCGGAAACCTGACTTGTATCGTCGCCATAGGGAATGAGAACAGAGGCATCTTGAGCCCCTGTGGGAGATGCAGACAAGTCCTCCTGGACCTCCATCCTGGGATCAAAGCCATAGTGAAGGACAGTGATGGACAGCCCACAGCCGTTGGGATCAGGGAGTTGCTGCCATCTGGTTATGTGTGGGAGGGCTAA TCTAGA.

Hygromycin phosphotransferase (hyg; cold spot optimized)

(SEQ ID NO: 5)ATGAAAAAGCCTGAACTGACTGCCACCTCTGTTGAGAAGTTTTTAATAGAGAAGTTTGACTCTGTGTCAGACCTCATGCAGCTTTCTGAGGGAGAGGAGTCTAGAGCCTTTAGCTTTGATGTGGGGGGGAGAGGCTATGTCCTGAGAGTCAATAGCTGTGCAGATGGTTTCTACAAAGATAGGTATGTCTATAGACATTTTGCATCCGCCGCCCTCCCCATTCCAGAGGTCCTTGACATTGGGGAATTCTCAGAGAGCCTGACCTATTGCATTTCCCGGAGAGCCCAGGGTGTGACTCTTCAAGACCTGCCTGAGACAGAACTCCCTGCAGTGCTCCAGCCCGTCGCCGAGGCCATGGATGCAATCGCCGCCGCAGACCTCAGCCAGACCTCGGGGTTTGGGCCCTTTGGCCCCCAGGGGATAGGCCAATACACTACATGGAGAGATTTCATATGCGCTATTGCTGACCCCCATGTGTATCACTGGCAAACTGTGATGGACGACACAGTCTCAGCCTCTGTCGCACAAGCCCTGGACGAGCTGATGCTTTGGGCCGAGGACTGCCCAGAGGTCAGACATCTCGTCCATGCCGACTTTGGGTCAAACAATGTCCTGACGGACAATGGGAGAATCACTGCTGTCATTGACTGGAGCGAGGCCATGTTTGGGGACTCCCAATACGAGGTCGCCAACATCTTCTTCTGGAGACCCTGGTTGGCTTGTATGGAGCAGCAGACCCGTTACTTTGAGAGGAGGCATCCAGAGCTCGCTGGGAGCCCTAGATTGAGGGCCTATATGCTCAGGATAGGGCTTGACCAACTCTATCAGAGCTTGGTTGACGGCAATTTTGATGACGCAGCTTGGGCTCAGGGGAGATGCGACGCCATAGTGAGGAGTGGGGCCGGGACTGTCGGGAGAACTCAGATCGCCAGGAGGTCAGCTGCCGTCTGGACTGACGGCTGTGTAGAAGTCTTAGCCGACTCTGGGAACAGGAGACCCAGCACTCGTCCAGAGGCCAAGGAATGATCTAGA.

Puromycin-N-acetyl-transferase (Pur; wild type sequence).

Contains a Kozak consensus sequence immediately 5′ to the start codon(underlined). Stop codon is also underlined.

(SEQ ID NO: 6) CACCATGACCGAGTACAAGCCCACGGTGCGCCTCGCCACCCGCGACGACGTCCCCCGGGCCGTTCGCACCCTCGCCGCCGCGTTCGCCGACTACCCCGCCACGCGCCACACCGTGGACCCGGACAGGCACATCGAGCGGGTCACCGAGCTGCAAGAACTCTTCCTCACGCGCGTCGGGCTCGACATCGGCAAGGTGTGGGTCGCGGACGACGGCGCCGCTGTGGCGGTCTGGACCACGCCGGAGAGCGTCGAAGCGGGGGCGGTGTTCGCCGAGATCGGCCCGCGCATGGCCGAGTTGAGCGGTTCCCGGCTGGCCGCGCAGCAACAGATGGAAGGCCTCCTGGCGCCGCACCGGCCCAAGGAGCCCGCGTGGTTCCTGGCTACCGTCGGAGTCTCGCCCGACCACCAGGGCAAGGGTCTGGGCAGCGCCGTCGTGCTCCCCGGAGTGGAGGCTGCCGAGCGTGCCGGGGTGCCCGCCTTCCTCGAGACCTCCGCGCCCCGCAACCTCCCCTTCTACGAGCGGCTCGGCTTCACCGTCACCGCCGACGTCGAGGTGCCCGAAGGACCGCGCACCTGGTGCATGACCCGCAAGCCCGGTG CCTGATCTAGA.

6. Terminator sequences, IVS-pA (shown with 3′ BamH I).

(SEQ ID NO: 7) GGATCTTTGTGAAGGAACCTTACTTCTGTGGTGTGACATAATTGGACAAACTACCTACAGAGATTTAAAGCTCTAAGGTAAATATAAAATTTTTAAGTGTATAATGTGTTAAACTACTGATTCTAATTGTTGTGGTATTTTAGATTCCAACCTATGGAACTTATGAATGGGAGCAGTGGTGGAATGCCTTTAATGAGGAAAACCTGTTTTGCTCAGAAGAAATGCCATCTAGTGATGATGAGGCTACTGCTGACTCTCAACATTCTACTCCTCCAAAAAAGAAGAGAAAGGTAGAAGACCCCAAGGACTTTCCTTCAGAATTGGTAAGTTTTTTGAGTCATGCTGTGTTTAGTAATAGAACTCTTGCTTGCTTTGCTATTTACACCACAAAGGAAAAAGCTGCACTGCTATACAAGAAAATTATGGAAAAATATTTGATGTATAGTGCCTTGACTAGAGATCATAATCAGCCATACCACATTTGTAGAGGTTTTACTTGCTTTAAAAAACCTCCCACACCTCCCCCTGAACCTGAAACATAAAATGAATGCAATTGTTGTTGTTAACTTGTTTATTGCAGCTTATAATGGTTACAAATAAAGCAATAGCATCACAAATTTCACAAATAAAGCATTTTTATCACTGCATTCTAGTTGTGGTTTGTCCAAACTCATCAATGTATCTTATCATGTCTGGA TCC.

7. Sequence of EBV oriP. This element permits episomal replication inEBV oriP permissive cells that express Epstein Barr Nuclear Antigen 1(EBNA1). The oriP sequence is preceded by an optional intergenic spacerregion (small letters):

(SEQ ID NO: 8)actgtcttctttatcatgcaactcgtaggacaggtgccctggccgggtccGCAGGAAAAGGACAAGCAGCGAAAATTCACGCCCCCTTGGGAGGTGGCGGCATATGCAAAGGATAGCACTCCCACTCTACTACTGGGTATCATATGCTGACTGTATATGCATGAGGATAGCATATGCTACCCGGATACAGATTAGGATAGCATATACTACCCAGATATAGATTAGGATAGCATATGCTACCCAGATATAGATTAGGATAGCCTATGCTACCCAGATATAAATTAGGATAGCATATACTACCCAGATATAGATTAGGATAGCATATGCTACCCAGATATAGATTAGGATAGCCTATGCTACCCAGATATAGATTAGGATAGCATATGCTACCCAGATATAGATTAGGATAGCATATGCTATCCAGATATTTGGGTAGTATATGCTACCCAGATATAAATTAGGATAGCATATACTACCCTAATCTCTATTAGGATAGCATATGCTACCCGGATACAGATTAGGATAGCATATACTACCCAGATATAGATTAGGATAGCATATGCTACCCAGATATAGATTAGGATAGCCTATGCTACCCAGATATAAATTAGGATAGCATATACTACCCAGATATAGATTAGGATAGCATATGCTACCCAGATATAGATTAGGATAGCCTATGCTACCCAGATATAGATTAGGATAGCATATGCTATCCAGATATTTGGGTAGTATATGCTACCCATGGCAACATTAGCCCACCGTGCTCTCAGCGACCTCGTGAATATGAGGACCAACAACCCTGTGCTTGGCGCTCAGGCGCAAGTGTGTGTAATTTGTCCTCCAGATCGCAGCAATCGCGCCCCTATCTTGGCCCGCCCACCTACTTATGCAGGTATTCCCCGGGGTGCCATTAGTGGTTTTGTGGGCAAGTGGTTTGACCGCAGTGGTTAGCGGGGTTACAATCAGCCAAGTTATTACACCCTTATTTTACAGTCCAAAACCGCAGGGCGGCGTGTGGGGGCTGACGCGTGCCATCACTCCACAATTTCAAGAGAAAGAGTGGCCACTTGTCTTTGTTTATGGGCCCCATTGGCGTGGAGCCCCGTTTAATTTTCGGGGGTGTTAGAGACAACCAGTGGAGTCCGCTGCTGTCGGCGTCCACTCTCTTTCCCCTTGTTACAAATAGAGTGTAACAACATGGTTCACCTGTCTTGGTCCCTGCCTGGGACACATCTTAATAACCCCAGTATCATATTGCACTAGGATTATGTGTTGCCCATAGCCATAAATTCGTGTGAGATGGACATCCAGTCTTTACGGCTTGTCCCCACCCCATGGATTTCTATTGTTAAAGATATTCAGAATGTTTCATTCCTACACTAGGATTTATTGCCCAAGGGGTTTGTGAGGGTTATATTGGTGTCATAGCACAATGCCACCACTGAACCCATCGTCCAAATTTTATTCTGGATGCGTCACCTGAAACCTTGTTTTCGAGCACCTCACATACACCTTACTGTTCACAACTCAGCAGTTATTCTATTAGCTAAACGAAGGAGAATGAAGAAGCAGGCGAAGATTCAGGAGAGTTCACTGCCCGCTCCTTGATCTTCAGCCACTGCCCTTGTGACTAAAATGGTTCACTACCCTCGTGGAATCCTGACCCCATGTAAATAAAACCGTGACAGCTCATGGGGTGGGAGATATCGCTGTTCCTTAGGACCCTTTTACTAACCCTAATTCGATAGCATATGCTTCCCGTTGGGTAACATATGCTATTGAATTAGGGTTAGTCTGGATAGTATATACTACTACCCGGGAAGCATATGCTACCCGTTTAGGGTTAACAAGGGGGCCTTATAAACACTATTGCTAATGCCCTCTTGAGGGTCCGCTTATCGGTAGCTACACAGGCCCCTCTGATTGACGTTGGTGTAGCCTCCCGTAGTCTTCCTGGGCCCCTGGGAGGTACATGTCCCCCAGCATTGGTGTAAGAGCTTCAGCCAAGAGTTACACATAAAGG.

8. Sequence of Escherichia coli origin of replication colEI, derivedfrom vector pJ15 and pJ31 from DNA2.0 (Menlo Park, Calif.): colE1

(SEQ ID NO: 9)AAAAGGGGCCCGAGCTTAAGACTGGCCGTCGTTTTACAACACAGAAAGAGTTTGTAGAAACGCAAAAAGGCCATCCGTCAGGGGCCTTCTGCTTAGTTTGATGCCTGGCAGTTCCCTACTCTCGCCTTCCGCTTCCTCGCTCACTGACTCGCTGCGCTCGGTCGTTCGGCTGCGGCGAGCGGTATCAGCTCACTCAAAGGCGGTAATACGGTTATCCACAGAATCAGGGGATAACGCAGGAAAGAACATGTGAGCAAAAGGCCAGCAAAAGGCCAGGAACCGTAAAAAGGCCGCGTTGCTGGCGTTTTTCCATAGGCTCCGCCCCCCTGACGAGCATCACAAAAATCGACGCTCAAGTCAGAGGTGGCGAAACCCGACAGGACTATAAAGATACCAGGCGTTTCCCCCTGGAAGCTCCCTCGTGCGCTCTCCTGTTCCGACCCTGCCGCTTACCGGATACCTGTCCGCCTTTCTCCCTTCGGGAAGCGTGGCGCTTTCTCATAGCTCACGCTGTAGGTATCTCAGTTCGGTGTAGGTCGTTCGCTCCAAGCTGGGCTGTGTGCACGAACCCCCCGTTCAGCCCGACCGCTGCGCCTTATCCGGTAACTATCGTCTTGAGTCCAACCCGGTAAGACACGACTTATCGCCACTGGCAGCAGCCACTGGTAACAGGATTAGCAGAGCGAGGTATGTAGGCGGTGCTACAGAGTTCTTGAAGTGGTGGGCTAACTACGGCTACACTAGAAGAACAGTATTTGGTATCTGCGCTCTGCTGAAGCCAGTTACCTTCGGAAAAAGAGTTGGTAGCTCTTGATCCGGCAAACAAACCACCGCTGGTAGCGGTGGTTTTTTTGTTTGCAAGCAGCAGATTACGCGCAGAAAAAAAGGATCTCAAGAAGATCCTTTGATCTTTTCTACGGGGTCTGACGCTCAGTGGAACGACGCGCGCGTAACTCACGTTAAGGGATTTTGGTCATGAGCTTGCGCCGTCCCGTCAAGTCAGCGTAATGCTCTG.

9A. Sequence of beta lactamase (bla) gene for resistance. The openreading frame (ORF) is shown in reverse orientation.

(SEQ ID NO: 10)CTTACCAATGCTTAATCAGTGAGGCACCTATCTCAGCGATCTGTCTATTTCGTTCATCCATAGTTGCCTGACTCCCCGTCGTGTAGATAACTACGATACGGGAGGGCTTACCATCTGGCCCCAGCGCTGCGATGATACCGCGAGAACCACGCTCACCGGCTCCGGATTTATCAGCAATAAACCAGCCAGCCGGAAGGGCCGAGCGCAGAAGTGGTCCTGCAACTTTATCCGCCTCCATCCAGTCTATTAATTGTTGCCGGGAAGCTAGAGTAAGTAGTTCGCCAGTTAATAGTTTGCGCAACGTTGTTGCCATCGCTACAGGCATCGTGGTGTCACGCTCGTCGTTTGGTATGGCTTCATTCAGCTCCGGTTCCCAACGATCAAGGCGAGTTACATGATCCCCCATGTTGTGCAAAAAAGCGGTTAGCTCCTTCGGTCCTCCGATCGTTGTCAGAAGTAAGTTGGCCGCAGTGTTATCACTCATGGTTATGGCAGCACTGCATAATTCTCTTACTGTCATGCCATCCGTAAGATGCTTTTCTGTGACTGGTGAGTACTCAACCAAGTCATTCTGAGAATAGTGTATGCGGCGACCGAGTTGCTCTTGCCCGGCGTCAATACGGGATAATACCGCGCCACATAGCAGAACTTTAAAAGTGCTCATCATTGGAAAACGTTCTTCGGGGCGAAAACTCTCAAGGATCTTACCGCTGTTGAGATCCAGTTCGATGTAACCCACTCGTGCACCCAACTGATCTTCAGCATCTTTTACTTTCACCAGCGTTTCTGGGTGAGCAAAAACAGGAAGGCAAAATGCCGCAAAAAAGGGAATAAGGGCGACACGGAAATGTTGAATACTCATATTCTTCCTTTTTCAATATTATTGAAGCATTTATCAGGGTTATTGTCTCATGAGCGGATACATATTTGAATGTATTTAGAAAAATAAACAAATAGGGGTCAGTGTTACAACCAATTAACCAATTCTGAACATTATCGCGAGCCCATTTATACCTGAATATGGCTCATAACACCCCTTGCAGTGCGACTAACGGCATGAAGCTCGTCGGGGCGTACG.

9B. Sequence of kanamycin (kan), derived from vector pJ31 from DNA2.0(Menlo Park, Calif.). The open reading frame (ORF) is shown in reverseorientation.

(SEQ ID NO: 11)CTTAGAAAAACTCATCGAGCATCAAATGAAACTGCAATTTATTCATATCAGGATTATCAATACCATATTTTTGAAAAAGCCGTTTCTGTAATGAAGGAGAAAACTCACCGAGGCAGTTCCATAGGATGGCAAGATCCTGGTATCGGTCTGCGATTCCGACTCGTCCAACATCAATACAACCTATTAATTTCCCCTCGTCAAAAATAAGGTTATCAAGTGAGAAATCACCATGAGTGACGACTGAATCCGGTGAGAATGGCAAAAGTTTATGCATTTCTTTCCAGACTTGTTCAACAGGCCAGCCATTACGCTCGTCATCAAAATCACTCGCATCAACCAAACCGTTATTCATTCGTGATTGCGCCTGAGCGAGGCGAAATACGCGATCGCTGTTAAAAGGACAATTACAAACAGGAATCGAGTGCAACCGGCGCAGGAACACTGCCAGCGCATCAACAATATTTTCACCTGAATCAGGATATTCTTCTAATACCTGGAACGCTGTTTTTCCGGGGATCGCAGTGGTGAGTAACCATGCATCATCAGGAGTACGGATAAAATGCTTGATGGTCGGAAGTGGCATAAATTCCGTCAGCCAGTTTAGTCTGACCATCTCATCTGTAACATCATTGGCAACGCTACCTTTGCCATGTTTCAGAAACAACTCTGGCGCATCGGGCTTCCCATACAAGCGATAGATTGTCGCACCTGATTGCCCGACATTATCGCGAGCCCATTTATACCCATATAAATCAGCATCCATGTTGGAATTTAATCGCGGCCTCGACGTTTCCCGTTGAATATGGCTCATATTCTTCCTTTTTCAATATTATTGAAGCATTTATCAGGGTTATTGTCTCATGAGCGGATACATATTTGAATGTATTTAGAAAAATAAACAAATAGGGGTCAGTGTTACAACCAATTAACCAATTCTGAACATTATCGCGAGCCCATTTATACCTGAATATGGCTCATAACACCCCTTGCAGTGCGACTAACGGCATGAAGCTCGTCGGGGAAATAATGATTTTATTTTGACTGATAGTGACCTGTTCGTTGCAACAAATTGATAAGCAATGCTTTCTTATAATGCCAACTTTGTACAAGAAAGCTGGGTTTTTTTTTTAGCCTGCTTTTTTGTACAAAGTTGGCATTATAAAAAAGCATTGCTCATCAATTTGTTGCAACGAACAGGTCACTATCAGTCAAAATAAAATCATTATTT.

10. A moiety used for determination of episomal copy number per cell.Ideally, the moiety contains a sequence that exists uniquely in thegenome. Shown below are 2 fragments, beta actin and G6PDH that can beused in vectors known in the art or described herein. Each fragment isbounded by a BsiWI and a Cla I site.

beta actin moiety

(SEQ ID NO: 12) CGTACGTACTCCTGCTTGCTGATCCACATCTGCTGGAAGGTGGACAGCGAGGCCAGGATGGAGCCGCCGATCCACACGGAGTACTTGCGCTCAGGAGGAGCAATGAAGCTTATCTGAGGAGGGAAGGGGACAGGCAGTGAGGACCCTGGATGTGACAGCTCCAAGCTTCCACACACCACAGGACCCCACAGCCGACCTGCCCAGGTCAGCTCAGGCAGGAAAGACACCCACCTTGATCTTCATTGTGCTGGGTGCCAGGGCAGTGATCTCCTTCTGCATCCTGTCATCGAT.

Human glucose-6-phosphate dehydrogenase (hG6PDH) moiety

(SEQ ID NO: 13) CGTACGAGGTGAGGCTGCAGTTCCATGATGTGTCCGGCGACATCTTCCACCAGCAGTGCAAGCGCAACGAGCTGGTGATCCGCGTGCAGCCCAACGAGGCCGTGTACCAGAGAAGGAGCAGTGTGGAGGGTGGGCGGCCTGGGCCCGGGGGACTCCACATGGTGGCAGGCAGTGGCATCAGCAAGACACTCTCTCCCTCACAGAACGTGAAGCTCCCTGACGCCTACGAGCGCCTCATCCTGGACGTCTTCTGCGGGAGCCAGATGCACTTCGTGCGCAGGAATCGAT.

11. pSV, immediate early promoter from SV40. The sequence is preceded bya BstBI site and followed by an NgoMIV site.

(SEQ ID NO: 14)TTCGAAGCTGTGGAATGTGTGTCAGTTAGGGTGTGGAAAGTCCCCAGGCTCCCCAGCAGGCAGAAGTATGCAAAGCATGCATCTCAATTAGTCAGCAACCAGGTGTGGAAAGTCCCCAGGCTCCCCAGCAGGCAGAAGTATGCAAAGCATGCATCTCAATTAGTCAGCAACCATAGTCCCGCCCCTAACTCCGCCCATCCCGCCCCTAACTCCGCCCAGTTCCGCCCATTCTCCGCCCCATGGCTGACTAATTTTTTTTATTTATGCAGAGGCCGAGGCCGCCTCGGCCTCTGAGCTATTCCAGAAGTAGTGAGGAGGCTTTTTTGGAGGCCTAGGCTTTTGCAAAAAGCTCTGACCCCTCACAAGGAGCCGGC.

Ig Enhancers. Representative Ig enhancer sequences include the heavy orlight chain enhancers. The Kappa 3′ enhancer region (Ek3′) (See Meyer,K. B. and Neuberger, M. S., EMBO J. 8 (7), 1959-1964 (1989)), and Kappaintronic enhancer region, Eki LOCUS L80040 7466 bp ROD 2 Sep. 2003 areshown below by way of example. At least 1 major active element withinthe enhancer regions is the E box sequence:

CAGGTG(N)₁₃CAGGTG [core sequence: CANNTG] Storb et al., Immunity19:235-242, 2003). The Ek3′ and Eki enhancer elements are obtained fromDr. Neuberger (MRC, UK). The Ek3′ sequence is amplified by PCR fromNeuberger plasmid identification #1352, using the following primers,which contain an XhoI and EcoRI site, respectively, that are used forcloning: GACTACCTCGAGccagcttaggctacacagag (SEQ ID NO: 276) andGTAGTCGAATTCCCACATGTCTTACATGGTATATG (SEQ ID NO: 277).

The Eki enhancer sequence is amplified from Dr. Neuberger's vector(identification #Me123) using oligonucleotidesGACTACGAATTCtcctgaggacacagtgatag (SEQ ID NO: 278) andGTAGTCGCGGCCGCCTAGTTCCTAGCTACTTCTTTA (SEQ ID NO: 279), which encode anEcoRI and NotI restriction site, respectively. Resulting fragments aredigested with the appropriate restriction enzyme, and clonedsequentially into mcs2 (described in below): Ek3′ is cloned into theXhoI and EcoRI sites of mcs2, and the resulting plasmid is then digestedwith EcoRI plus Nod into which the Eki fragment is subsequently ligatedto generate vector AB156.

As described above, E boxes are known to be present in the kappaenhancer region. Consequently, a synthetic cassette consisting of 3tandemly arrayed E boxes is synthesized using the complementaryoligonucleotides AATTCaggtgctggggtagggagcaggtgctacactgcagaccaggtgctGC(SEQ ID NO: 280) andggccgcagcacctggtctgcagtgtagcacctgctccctaccccagcacctg (SEQ ID NO: 281),which when annealed contain EcoRI and NotI overhangs. The annealed oligoproduct is thus cloned into the EcoRI and NotI sites of mcs2 to generatevector AB157.

A representative Ig-kappa locus 3′ enhancer element is listed below.(Accession number X15878)

(SEQ ID NO: 282)CCAGCTTAGGCTACACAGAGAAACTATCTAAAAAATAATTACTAACTACTTAATAGGAGATTGGATGTTAAGATCTGGTCACTAAGAGGCAGAATTGAGATTCGAAGCCAGTATTTTCTACCTGGTATGTTTTAAATTGCAGTAAGGATCTAAGTGTAGATATATAATAATAAGATTCTATTGATCTCTGCAACAACAGAGAGTGTTAGATTTGTTTGGAAAAAAATATTATCAGCCAACATCTTCTACCATTICAGTATAGCACAGAGTACCCACCCATATCTCCCCACCCATCCCCCATACCAGACTGGTTATTGATTTTCATGGTGACTGGCCTGAGAAGATTAAAAAAAGTAATGCTACCTTATTGGGAGTGTCCCATGGACCAAGATAGCAACTGTCATAGCTACCGTCACACTGCTTTGATCAAGAAGACCCTTTGAGGAACTGAAAACAGAACCTTAGGCACATCTGTTGCTTTCGCTCCCATCCTCCTCCAACAGCCTGGGTGGTGCACTCCACACCCTTTCAAGTTTCCAAAGCCTCATACACCTGCTCCCTACCCCAGCACCTGGCCAAGGCTGTATCCAGCACTGGGATGAAAATGATACCCCACCTCCATCTTGTTTGATATTACTCTATCTCAAGCCCCAGGTTAGTCCCCAGTCCCAATGCTTTTGCACAGTCAAAACTCAACTTGGAATAATCAGTATCCTTGAAGAGTTCTGATATGGTCACTGGGCCCATATACCATGTAAGACATGTGG.

A representative Kappa intronic enhancer region, Eki is presented below:

(SEQ ID NO: 283)TCCTGAGGACACAGTGATAGGAACAGAGCCACTAATCTGAAGAGAACAGAGATGTGACAGACTACACTAATGTGAGAAAAACAAGGAAAGGGTGACTTATTGGAGATTTCAGAAATAAAATGCATTTATTATTATATTCCCTTATTTTAATTTTCTATTAGGGAATTAGAAAGGGCATAAACTGCTTTATCCAGTGTTATATTAAAAGCTTAATGTATATAATCTTTTAGAGGTAAAATCTACAGCCAGCAAAAGTCATGGTAAATATTCTTTGACTGAACTCTCACTAAACTCCTCTAAATTATATGTCATATTAACTGGTTAAATTAATATAAATTTGTGACATGACCTTAACTGGTTAGGTAGGATATTTTTCTTCATGCAAAAATATGACTAATAATAATTTAGCACAAAAATATTTCCCAATACTTTAATTCTGTGATAGAAAAATGTTTAACTCAGCTACTATAATCCCATAATTTTGAAAACTATTTATTAGCTTTTGTGTTTGACCCTTCCCTAGCCAAAGGCAACTATTTAAGGACCCTTTAAAACTCTTGAAACTACTTTAGAGTCATTAAGTTATTTAACCACTTTTAATTACTTTAAAATGATGTCAATTCCCTTTTAACTATTAATTTATTTTAAGGGGGGAAAGGCTGCTCATAATTCTATTGTTTTTCTTGGTAAAGAACTCTCAGTTTTCGTTTTTACTACCTCTGTCACCCAAGAGTTGGCATCTCAACAGAGGGGACTTTCCGAGAGGCCATCTGGCAGTTGCTTAAGATCAGAAGTGAAGTCTGCCAGTTCCTCCCAGGCAGGTGGCCCAGATTACAGTTGACCTGTTCTGGTGTGGCTAAAAATTGTCCCATGTGGTTACAAACCATTAGACCAGGGTCTGATGAATTGCTCAGAATATTTCTGGACACCCAAATACAGACCCTGGCTTAAGGCCCTGTCCATACAGTAGGTTTAGCTTGCTACACCAAAGGAAGCCATACAGAGGCTAATACCAGAGTATTCTTGGAAGAGACAGGAGAAAATGAAAGCCAGTTTCTGCTCTTACCTTATGTGCTTGTGTTCAGACTCCCAAACATCAGGAGTGICAGATAAACTGGTCTGAATCTCTGTCTGAAGCATGGAACTGAAAAGAATGTAGTTTCAGGGAAGAAAGGCAATAGAAGGAAGCCTGAGAATATCTTCAAAGGGTCAGACTCAATTTACTTTCTTAAAGAAGTAGCTAGGAACTAG.

C. Vector Construction

Vector Format 1. The functional elements described below are ordered as7 synthetic DNA fragments, each cloned into vector pJ2 or pJ15 fromDNA2.0 (Menlo Park, Calif.). Both pJ vectors contain a colE1 E. coliorigin of replication and a selectable marker (amp for pJ15, kan forpJ2); the sequences of each have been altered by DNA2.0 to minimizerestriction sites. The pJ vector inserts are designed to contain one ormore of the genetic elements listed below for the final vector constructbounded by restriction sites that allowed for the correct assembly offragments in the desired order.

Vector F1, a covalently closed circular plasmid, contains (in order):DNA2.0 vector pJ15 (which contains the colE1 on and amp resistancemarker), restriction sites BsiWI, SacI, BsrGI, NgoMIV, XbaI, BamHI,MluI, approximately 800 bp of the EBV oriP, and AflII (see FIG. 35A).

Insert F2 contains approximately 880 bp of the oriP, including a highlyrepetitive portion, from the natively encoded restriction sites Mlu I toNsi I. This fragment is recovered from a clone purchased from theAmerican Type Culture Collection (catalog number ATCC 59562) usingrestriction sites Mlu I to Nsi I.

Insert F3 is reclaimed from pJ2 and encodes the SV40 3′ untranslatedregion and poly adenylation signals (3′ut/pA), a BamHI restriction site,and the remaining portion of the oriP. The fragment is bounded by NsiIand XbaI sites.

Insert F4 is reclaimed from pJ2 and contains the eukaryotic antibioticresistance marker puromycin-N-acetyl-transferase (pur) bounded byrestriction sites NgoMIV and XbaI.

Insert F5 is reclaimed from pJ15 and contains the EMCV IRES sequencebounded by restriction sites NgoMIV and BsrGI.

Insert F6 is reclaimed from pJ15 and contains a synthetic version of theteal fluorescent protein open reading frame (hotTFP) that has beenaltered to render it more susceptible to somatic hypermutation (SHM),and thus the appellation “hot.” The fragment is delineated by BsrGI andSacI restriction sites.

Vector F7, a covalently closed circular plasmid, contains (in order):DNA2.0 vector pJ15 (which contains the amp resistance marker and colE1ori), restriction sites AflII, BamHI, XbaI, NgoMIV, BsrGI, SacI,multicloning site region 1 (mcs1), CMV immediate early promoter,multicloning site region 2 (mcs2), beta actin (3 actin) moiety fordetermination of episomal copy number per cell, BsiWI (see FIG. 35B).

The Mcs1 contains the following restriction sites: (5′) SbfI, PmeI,BglII, SacII, SpeI, BsmBI, and SacI (3′). Mcs2 contains (5′) ClaI, XhoI,BclI, EcoRI, AgeI, NotI, and NheI (3′).

The prototypic vector AB102 is assembled as described below usingstandard molecular biological manipulations: All final constructs areconfirmed by sequence analysis.

1. A triple ligation is performed to incorporate inserts F5 (reclaimedwith BsrG I and NgoM IV) and F6 (reclaimed with Sac I and BsrG I) intovector F7 to make vector ANA113.

2. A second triple ligation is performed to integrate inserts F2 (Mlu INsi I) and F3 (Nsi I to Xba I) into vector F1 to generate vector ANA110.

3. A double ligation is performed to include insert F4 (from Xba I toNgoMIV) in vector ANA110, thus generating vector ANA 119.

4, Finally, a double ligation is performed to assimilate the insert fromANA113 above (BsiW I to NgoM IV) into the BsiW I and NgoM IV cut vectorANA119 to generate AB102.

The judicious placement of restriction sites renders AB102 highlymodular (see FIGS. 36A and 36B). Consequently, most subsequent vectorformats (i.e. vector formats 2-4) are, in some cases, generated bysimple fragment exchange. For example, the open reading frames forblasticidin S deaminase (cold bsd) and hygromycin phosphotransferase(cold hyg) are synthesized at DNA2.0 using codons that are SHMresistant, as described herein (thus the designation “cold”). Both ofthese markers are bounded by restriction sites NgoMIV and XbaI and aretherefore easily cloned in place of pur (also delineated by NgoMIV andXbaI) to generate versions of the vectors that confer resistance to bsdand hyg, respectively.

Similarly, other genes of interest are synthesized so as to be borderedon the 5′ side by one of the unique restriction sites in mcs1 plus BsrGIor AscI at the 3′ end. Such genes cloned into the position initiallyoccupied by TFP (FIGS. 36A and 36B) included other fluorescent proteins(GFP, and “GFP*” [a GFP that contains an in frame stop codon in lieu oftyr82]), immunoglobulins such as antibody heavy or light chains,activation induced cytidine deaminase (AID, also known as AICDA), andthe reverse tetracycline transactivatable protein (rtTA).

There are several versions of beta actin and G6PDH, differing by length,that permit the simultaneous identification and quantification of morethan one episomal vector. Each version is cloned into the same locationusing the BsiWI and ClaI sites,

Different varieties of the CMV immediate early promoter (i.e. versionsin which the SacI, and BsrGI restriction sites have been eliminated, andversions that include more or fewer nucleotides at the 5′ end of theenhancer region are swapped in and out of AB vector series constructs byusing NheI and SbfI, which are unique restriction sites found in mcs2(5′) and mcs1(3′), respectively.

Vector Format 5. AB184, a derivative of AB102, is an archetypicalexample of a vector in which expression of a gene of interest can beinduced by addition of doxycycline (dox) (FIG. 37A). AB184 differs fromAB102 in the following ways:

The TFP open reading frame is replaced with a cold codon version of thertTA coding region (see, e.g., Gossen and Bujard, Proc Natl Acad SciUSA. (1992) 89(12):5547-51; Gossen et al., Science (1995) 268(5218):1766-9). The rtTA gene is synthesized at DNA2.0 in vector pJ2 andis reclaimed by PCR amplification such that BglII is included as acloning site on the 5′ (sense) primer and AscI included on the 3′(antisense) primer. The PCR fragment is cloned into the BglII (found inmcs1) and AscI sites of AB102 (FIG. 38).

The hyg open reading frame is synthesized with cold codon usage, andwith 5′ NgoMIV and 3′ XbaI sites in pJ2 bp DNA2.0) to make vectorANA112. The hyg insert from this vector replaced pur as the selectablemarker (FIG. 38) in AB184.

ANA112 is also used to supply the prokaryotic resistance marker kan. ABsiWI site is added 5′ to the kan marker by site directed mutagenesis(SDM) to generate ANA136. The contiguous kan and colE1 fragment isreclaimed from ANA136 using BsiWI and AflII and replaced the homologousamp-colE1 fragment in AB102 to generate the kan resistant precursor toAB184.

A BstBI site is added next to the unique AflII site (order of elements:kan-colE1-AflII-BstBI . . . )

Finally, a dox-inducible cold canine AID expression cassette is createdand added to the vector at the AflII site found between the oriP andcolE1 ori. The steps to do this are described below.

Vector F10 is synthesized at DNA2.0 and contains the following elements(in order): i. colE1 on and kan marker from pJ2; ii. SacI site; iii. thehuman growth hormone (hGH) minimal promoter (sequenceACAGTGGGAGAGAAGGGGCCAGGGTATAAAAAGGGCCCACAAGAGACCAGCTCAAGGATTCCAAGG C;SEQ ID NO: 284); iv. SpeI site; v. canine AID with cold codon usage, anda single mutation (L198A) that disables the nuclear export signal; andvi. AgeI site.

Vector F10 contained an unwanted AflII site 3′ to the AgeI site (FIG.37B). This site is removed by site directed mutagenesis. An AflII siteis added by site directed mutagenesis 5′ to the SacI site to generateANA165.

A 7-fold repeat of the tet operator elements, including the minimal hGHpromoter sequence listed in (a) above is synthesized at DNA2.0 inplasmid pJ51 to generate vector 7xprtTA. The insert order is (5′)AflII-7× prtTA operator elements-SacI-hGH promoter- and SpeI (3′). The7× repeat is reclaimed from pJ51 with AflII and SpeI and cloned into thecognate sites of ANA165 to generate ANA209 (FIG. 38).

To complete the expression cassette, a triple ligation is performed thatincluded (1) AflII-cut and CIP (calf intestinal phosphatase) treatedvector pJ15; plus (2) Insert 7xprtTA-coldCanine AIDnes(−) from Age I toAfl II; plus (3) PCR amplified 3′ ut/pA fragment (originally synthesizedas insert F3, in which Age I (5′) and Afl II (3′) restriction sites areincorporated into the PCR primers. This created vector ANA285.

Finally, the entire inducible expression cassette (which contained inorder the following genetic markers: AflII-7xprtTA-canine AID-3′ut/pA-AflII) is reclaimed from ANA285 by PCR using oligos that includeBstBI sites and span the AflII site on the 5′ side. The reclaimed insertthus had the following genetic elements: AflII-BstBI-7xprtTA-canineAID-3′ ut/pA-AflII-BstBI. This PCR amplified product is digested withBstBI and is cloned into the unique BstBI site in the AB184 precursordescribed in step (3) above to generate AB184 (FIG. 37A). (Note: theAflII site at the 3′ end of the cassette is part of the cloning vectorand is not carried into the final construct by the PCR amplifiedinsert).

Example 6 Application of SHM to Generation of Novel LantibioticSynthesis

The evolution of bacteria with resistance to existing therapeuticregimens has sparked interest in the discovery and development of novelantibiotics. Ideal candidates for further research are those that actvia multiple modes of action, making resistance significantly moredifficult to attain. One such antibiotic is Nisin.

Nisin is a natural product of Lactococcus lactic, a lantibiotic with abroad spectrum of activity against Gram-positive bacteria, commonly usedin food preservation against such pathogens as Listeria monocytogenesand Clostridium botulinum. (BAVIN et al., Lancet. 1952 Jan. 19;1(3):127-9) Nisin is a ribosomally translated and post-translatedpeptide, which despite decades of use by the food industry, has not seenthe induction of common resistance mechanisms. This finding is likely aresult of two facts: one, the mode of action of Nisin biocidal activitycomes from its'binding to Lipid II and secondary induction of poreformation, (Breukink et al., Nat Rev Drug Discov. 2006 April;5(4):321-32). Lipid II is a bacterial cell-wall component that is noteasily modified by Gram-positive bacteria and whose use forms arate-limiting step in the generation of the bacterial cell wall. Nisinalso acts to inhibit spore formation.

Nisin is currently in preclinical development for the treatment ofseveral bacterial pathogens. It displays a spectrum of activity towardsseveral pathogens, including multi drug-resistant Streptococcuspneumoniae, vancomycin-resistant Enterococcus faecium, and Strepococcuspyogenes, all areas where new therapeutics are desperately needed(Goldstein et al., J. Antimicrob. Chemother. 1998 August; 42(2):277-8).In one study, Nisin was shown to be 8-16 times more potent in thetreatment of S. pneumoniae (in mice) than vancomycin (Brumfitt et al., JAntimicrob Chemother. 2002 November; 50(5):731-4).

Despite these promising features, Nisin and other lantibotics sufferfrom several important limitations. Bacteria, even closely related(isogenic) species, display a significant variation in their sensitivityto Nisin and other lantibiotics. Secondly, Nisin is cleared quickly frommammalian circulatory system. For Nisin to become a truly efficacioustherapeutic, it will need to have improved pharmacodynamic propertieswith a broad spectrum of biocidal activity. Here we discuss applicationof SHM to engineer a Nisin with improved qualities.

Biosynthesis of bioactive Nisin has been to shown to be dependent ononly five L. lactis proteins, NisA, NisB, NisC, NisP, and NisT (Kuiperset al., J Biol. Chem. 2004 Can 21; 279(21):22176-82; Rink et al.,Biochemistry. 2005 Jun. 21; 44(24):8873-82). NisA encodes for aprecursor peptide which is dehydrated at several serine and threoninepositions by NisB, leading to a modified peptide that is cyclized atfive positions by NisC. Finally the pro-antibiotic has its leaderpeptide cleaved by protease NisP, and is excreted to the media bytransporter NisT (see FIG. 47A). The five thioester rings, eachcatalyzed by NisC, are termed lanthionines, and define the lantibioticfamily of modified peptide antibiotics.

The modular nature of this pathway, easy assay for bioactivity, broadspecificity and activity of the dehydratase and cyclase NisB and NisC,make this an ideal target for SHM driven co-evolution to produce novelantibiotic constructs. In one approach such a strategy could be based onmaking certain genes, or portions of genes more susceptible to SHM,while making other genes, or portions of those genes, resistant to SHM.

The amino acid sequences of the 5 genes involved in Nisin biosynthesisare shown below: In these sequences, bold residues indicate thosepositions to be made hot to SHM, while underlined residues are those tobe made cold to SHM.

NisA, Native Gene >NisA|gi|530218|gb|AAA26948.1| nisin [Lactococcuslactis]

(SEQ ID NO: 285) MSTKDFNLDLVSVSKKDSGASPRITSISLCTPGCKTGALMGCNMKTATCHCSIHVSK

NisC, Native Gene >NisC|gi|44045|emb|CAA48383.1| nisC [Lactococcuslactis]

(SEQ ID NO: 286)MRIMMNKKNIKRNVEKIIAQWDERTRKNKENFDFGELTLSTGLPGIILMLAELKNKDNSKIYQKKIDNYIEYIVSKLSTYGLLTGSLYSGAAGIALSILHLREDDEKYKNLLDSLNRYIEYFVREKIEGFNLENITPPDYDVIEGLSGILSYLLLINDEQYDDLKILIINFLSNLTKENNGLISLYIKSENQMSQSESEMYPLGCLNMGLAHGLAGVGCILAYAHIKGYSNEASLSALQKIIFIYEKFELERKKQFLWKDGLVADELKKEKVIREASFIRDAWCYGGPGISLLYLYGGLALDNDYFVDKAEKILESAMQRKLGIDSYMICHGYSGLIEKSLFKRLLNTKKFDSYMEEFNVNSEQILEEYGDESGTGFLEGISGCILVLSKFEYSINFTYWRQALLLFDDFLKGGKRK

NisB, Native Gene >gi|473018|emb|CAA79468.1| NisB protein [Lactococcuslactis]

(SEQ ID NO: 287)MIKSSFICAQPFLVRNTILSPNDKRSFTEYTQVIETVSKNKVFLEQLLLANPKLYNVMQKYNAGLLKKKRVKKLFESIYKYYKRSYLRSTPFGLFSETSIGVFSKSSQYKLMGKITKGIRLDTQWLIRLVHKMEVDFSKKLSFTRNNANYICFGDRVFQVYTINSSELEEVNIKYTNVYQIISEFCENDYQKYEDICETVTLCYGDEYRELSEQYLGSLIVNHYLISNLQKDLLSDFSWDTFLTKVEAMEDKKYIIPLKKVQKFIQEYSEIEIGEGIEKLKEIYQEMSQILENDNYIQEDLISDSEINFDVKQKQQLEHLAEFLGNTTKSVRRTYLDDYKDKFIEKYGVDQEVQITELFDSTFGIGAPYNYNHPRNDFYESEPSTLYYSEEEREKYLSMYVEAVKNHNVINLDDLESHYQKMDLEKKSELQGLELFLNLAKEYEKDIFILGDIVGNNNLGGASGRFSALSPELTSYHRTIVDSVERENENKEITSCEIVFLPENIRHANVMHTSIMRRKVLPFFTSTSHNEVQLTNIYIGIDEKEKFYARDISTQEVLKFYITSMYNKTLFSNELRFLYEISLDDKFGNLPWELIYRDFDYIPRLVFDEIVISPAKWKIWGRDVNNMTIRELIQSKEIPKEFYIVNGDNKVYLSQENPLDMEILESAIKKSSKRKDFIELQEYFEDENIINKGQKGRVADVVVPFIRTRALGNEGRAFIREKRVSVERREKLPFNEWLYLKLYISINRQNEFLLSYLPDIQKIVANLGGKLFFLRYTDPKPHIRLRIKCSDLFLAYGSILEILKRSQKNRIMSTFDISIYDQEVERYGGFDTLELSEAIFCADSKIIPNLLTLIKDTNNDWKVDDVSILVNYLYLKCFFQNDNKKILNFLNLVSPKKVKENVNEKIEHYLKLLKVDNLGDQIFYDKNFKELKHAIKNLFLKMIAQDFELQKVYSIIDSIIHVHNNRLIGIERDKEKLIYYTLQRLFVSEEYMK

NisP, Native Gene>gi|730155|sp|Q07596|NISP_LACLA Nisin leaderpeptide-processing serine protease nisP precursor

(SEQ ID NO: 288)MKKILGFLFIVCSLGLSATVHGETTNSQQLLSNNINTELINHNSNAILSSTEGSTTDSINLGAQSPAVKSTTRTELDVTGAAKTLLQTSAVQKEMKVSLQETQVSSEFSKRDSVTNKEAVPVSKDELLEQSEVVVSTSSIQKNKILDNKKKRANFVTSSPLIKEKPSNSKDASGVIDNSASPLSYRKAKEVVSLRQPLKNQKVEAQPLLISNSSEKKASVYTNSHDFWDYQWDMKYVTNNGESYALYQPSKKISVGIIDSGIMEEHPDLSNSLGNYFKNLVPKGGFDNEEPDETGNPSDIVDKMGHGTEVAGQITANGNILGVAPGITVNIYRVFGENLSKSEWVARAIRRAADDGNKVINISAGQYLMISGSYDDGTNDYQEYLNYKSAINYATAKGSIVVAALGNDSLNIQDNQTMINFLKRFRSIKVPGKVVDAPSVFEDVIAVGGIDGYGNISDFSNIGADAIYAPAGTTANFKKYGQDKGVSQGYYLKDWLFTTANTGWYQYVYGNSFATPKVSGALALVVDKYGIKNPNQLKRFLLMNSPEVNGNRVLNIVDLLNGKNKAFSLDTDKGQDDAINHKSMENLKESRDTMKQEQDKEIQRNTNNNFSIKNDFNISDEVISVDYNINQKMANNRNSRGAVSVRSQEILPVTGDGEDFLPALGIVCISILGILKRKTKN

NisT, Native Gene>gi/44044|emb|CAA48382.1| nisT [Lactococcus lactis]

(SEQ ID NO: 289)MDEVKEFTSKQFFYTLLTLPSTLKLIFQLEKRYAIYLIVLNAITAFVPLASLFIYQDLINSVLGSGRHLINIIIIYFIVQVITTVLGQLESYVSGKFDMRLSYSINMRLMRTTSSLELSDYEQADMYNIIEDVTQDSTYKPFQLFNAIIVELSSFISLLSSLFFIGTWNIGVAILLLIVPVLSLVLFLRVGQLEFLIQWQRASSERETWYIVYLLTHDFSFKEIKLNNISNYFIHKFGKLKKGFINQDLAIARKKTYFNIFLDFILNLINILTIFAMILSVRAGKLLIGNLVSLIQAISKINTYSQTMIQNIYIIYNTSLFMEQLFEFLKRESVVHKKIEDTEICNQHIGTVKVINLSYVYPNSNAFALKNINLSFEKGELTAIVGKNGSGKSTLVKIISGLYQPTMGIIQYDKMRSSLMPEEFYQKNISVLFQDFVKYELTIRENIGLSDLSSQWEDEKIIKVLDNLGLDFLKTNNQYVLDTQLGNWFQEGHQLSGGQWQKIALARTFFKKASIYILDEPSAALDPVAEKEIFDYFVALSENNISIFISHSLNAARKANKIVVMKDGQVEDVGSHDVLLRRCQYYQELYYSEQYEDNDE

NisB, N is P and NisT

As described above, the creation of SHM resistant (cold) versions of theessential genes NisP and NisT means that these genes will tend to mutateat a lower rate than SHM-susceptible genes that are targeted fordiversity generation. Both NisP and NisT currently have broadspecificity for the Nisin and do not add to the potential diversity ofthe post-translationally modified peptide. In this initial example, NisBis also made SHM-resistant; however, it could also be selectivelymutated following the same guidelines outlined below for NisA. Nativeand SHM-resistant polynucleotides of NisB are provided in FIGS. 39A and40A, respectively, together with the initial analysis of hot spot andcold spot frequency (FIGS. 39B and 40B), respectively. Native andSHM-resistant polynucleotides of NisP are provided in FIGS. 41A and 42A,respectively, together with the initial analysis of hot spot and coldspot frequency (FIGS. 41B and 42B), respectively.

NisA Peptide

As shown above, the majority of the leader peptide region of the NisApeptide can be made resistant to AID-mediated mutagenesis because thissequence is absolutely necessary for substrate recognition by NisBCPT.The bulk of the remainder of the NisA peptide sequence can be madesusceptible to AID-mediated mutagenesis, or alternatively, as shownabove key residues involved in the generation of the lanthionines can bemade SHM-resistant thereby reducing the rate of their mutagenesis.

Corresponding unmodified and SHM-resistant (cold) polynucleotideversions of the NisA polynucleotide sequence are shown in FIGS. 44A and44D, respectively. The initial analysis of the unmodified hot spot andcold spot frequency (FIGS. 44B and 44C), respectively, are compared tothe SHM-resistant hot spot and cold spot frequency (FIGS. 44E and 44F),respectively. Codon optimization of NisA results in the creation of 20cold spots and elimination of all but one hot spot in the leadersequence, and the creation of 17 hot spots, compared to 8 hot spots inthe wild type sequence, in the rest of the molecule.

NisC Protein

Regions of NisC involved in substrate recognition and cyclization, suchas those outlined above as bold residues, and as shown in FIG. 47B, canbe made hot (susceptible) to AID-mediated mutation, so that they have agreater probability of generating mutants with alternate activities andspecificities thereby creating mature Nisin molecules with alteredmodifications and bioactivity. Structural areas that govern onlystability of the protein can be made cold. Corresponding unmodified andSHM-resistant (cold) polynucleotide versions of the NisC polynucleotidesequence are shown in FIGS. 45A and 46A, respectively. The initialanalysis of the unmodified hot spot and cold spot frequency (FIGS. 45Band 45C), respectively, are compared to the SHM-resistant hot spot andcold spot frequency (FIGS. 46B and 46C), respectively.

A specific example of the creation of a targeted hot spot in this geneis shown below:

In this example, using SHMredesign, an additional hot spot has beeninserted into the region of interest (LSTG) and a cold spot has beenremoved. Additionally the flanking sequence has been made significantlymore SHM resistant.

(SEQ ID NO: 290) ..N..F..D..F..G..E..L..T.. L..S..T..G..L..P..G amino acid sequence Native polynucleotide sequence:HhhhhhhhhhhhhhhhHhhhhhhhhhhhhHhhhhhhhhhhhhHhh hot spotscccccccCcccccCCcCccccCcCcCcCccccccCCccccccccc cold spotsOptimized polynucleotide sequence:HhhhhhhhhhhhhhhhHhhhhhhhhhhHhHhhhhhhhhhhhhhh hot spotsccccccCcccccCCcCccCcccCcccCcccccccCcCcCCccCc cold spots

After final review to ensure that the synthetic polynucleotide sequenceis free of extraneous restriction sites, the complete syntheticpolynucleotide sequences can be synthesized (DNA 2.0, Menlo Park,Calif.), and cloned appropriate cloning vectors and sequenced to confirmcorrect synthesis.

Synthetic genes can then be introduced into expression vectors andtransformed into an appropriate bacterial strain, for example aLactococcus lactis strains as previously described (Mota-Meira et al.,FEBS Lett. 1997 Jun. 30; 410(2-3):275-9) together with AID, (Besmer etal., Mol Cell Biol. 2006 June; 26(11):4378-85) or an AID homolog such asan Apobec-1 enzyme.

Screening can be accomplished by allowing the SHM mediated generateddiversity to evolve L. lactis co-cultured with Gram positive bacterialtargets that are currently poorly targeted by Nisin. Eventually strainsof L. lactis will evolve that comprise mutated Nisin genes with enhancedactivity against the chosen bacterial target.

Mass spectroscopy of the supernatant of evolved cell-cultures can beused to assess the progress of the process (i.e., identified novellantibiotics with improved activity to a pathogen).

Example 7 Application of SHM to the Generation of Improved Receptors

Receptors bind ligands and encompass a broad genus of polypeptidesincluding, but not limited to, cell-bound receptors such as antibodies(B cell receptors), T cell receptors, Fc receptors, G-coupled proteinreceptors, cytokine receptors, etc.

Fc receptors (FcR) are a family of related receptors that are specificfor the Fc portion of immunoglobulin (Ig) molecules. Receptors have beendefined for each of the immunoglobulin classes and, as such are definedby the class of Ig of which they bind: Fc gamma receptors (FcγR) bindgamma immunoglobulin (IgG), Fc epsilon receptors (FcεR) bind epsilonimmunoglobulin (IgE), Fc alpha receptors (FcαR) bind alphaimmunoglobulin (IgA).

Fcγ receptors are expressed on most hematopoietic cells and, via bindingof IgG, play an important role in homeostasis of the immune system andunmodified host protection against infection. Three subfamily members ofFcγR receptors have been identified: (1) FcγRI, which is a high affinityreceptor for IgG; (2) FcγRII, which is a low affinity receptor for IgGthat avidly binds to aggregates of immune complexes; and (3) FcγRIII,which is a low affinity receptor that binds to immune complexes. Inspite of being structurally related, the receptors perform differentfunctions.

Three subclasses of FcγR have been identified: FcγRI (CD64), FcγRII(CD32) and FcγRIII (CD16). Because each FcγR subclass is encoded by twoor three genes, and alternative RNA spicing leads to multipletranscripts, a broad diversity in FcγR isoforms exists. The three genesencoding the FcγRI subclass (FcγRIA, FcγRIB and FcγRIC) are clustered inregion 1q21.1 of the long arm of chromosome 1; the genes encoding FcγRIIisoforms (FcγRIIA, FcγSHM and FcγRIIC) and the two genes encodingFcγRIII (FcγRIIIA and FcγRIIIB) are all clustered in region 1q22. Thesedifferent FcR subtypes are expressed on different cell types (reviewedin Ravetch and Kinet, Annu. Rev. Immunol. 9:457-492 (1991)). Forexample, in humans, FcγRIIIB is found only on neutrophils, whereasFcγRIIIA is found on macrophages, monocytes, natural killer (NK) cells,and a subpopulation of T-cells. Notably, FcγRIIIA is the only FcRpresent on NK cells, one of the cell types implicated in ADCC.

FcγRI, FcγRII and FcγRIII are immunoglobulin superfamily (IgSF)receptors; FcγRI has three IgSF domains in its extracellular domain,while FcγRII and FcγRIII have only two IgSF domains in theirextracellular domains.

Another type of Fc receptor is the neonatal Fc receptor (FcRn). FcRn isstructurally similar to major histocompatibility complex (MEC) andconsists of an α-chain non-covalently bound to β2-microglobulin.

The binding site on human and murine antibodies for FcγR have beenpreviously mapped to the so-called “lower hinge region” consisting ofresidues 233-239 (EU index numbering as in Kabat et al., Sequences ofProteins of Immunological Interest, 5th Ed. Public Health Service,National Institutes of Health, Bethesda, Md. (1991)). Woof et al. Molec.Immunol. 23:319-330 (1986); Duncan et al. Nature 332:563 (1988);Canfield and Morrison, J. Exp. Med. 173:1483-1491 (1991); Chappel etal., Proc. Natl. Acad. Sci USA 88:9036-9040 (1991).

Other amino acid residues considered to be involved in binding to FcγRinclude: G316-K338 (human IgG) for human FcγRI (Woof et al. Molec.Immunol. 23:319-330 (1986)); K274*R301 (human IgG1) for human FcγRIII(based on peptides) (Sarcan et al. Molec. Immunol. 21:43-51 (1984));Y407-R416 (human IgG) for human FcγRIII (based on peptides) (Gergely etal. Biochem. Soc. Trans. 12:739-743 (1984)); as well as N297 and E318(murine IgG2b) for murine FcγRII (Lund et al., Molec. Immunol., 29:53-59(1992)). In one case, Pro331 in IgG3 is mutated to Ser, and the affinityof this modified IgG3 to target cells analyzed: the affinity is found tobe six fold lower than that of unmutated IgG3, indicating theinvolvement of Pro331 in FcγRI binding. Morrison et al., Immunologist,2:119-124 (1994); and Canfield and Morrison, J. Exp. Med. 173:1483-91(1991).

Compounds that interfere with the dimerization interface between twoFcγRII proteins can affect cellular signal transduction through one orboth of the FcR proteins. Specifically, amino acid residues 117-131 and150-164 of FcγRII are thought to be the interfacial area of the FcγIIadimer, and compounds which can mimic or bind to these regions areconsidered to be good binding modulators (see U.S. Pat. No. 6,835,753).

In one aspect, synthetic polynucleotides encoding at least a portion ofthe Fc regions of a plurality of IgG molecules can be modified toincrease susceptibility of that portion to somatic hypermutation usingthe methods described herein to create modified IgG molecules exhibitingincreased binding affinity for this region of FcγRIIa.

In another aspect, polypeptides encoded by such SHM-modifiedpolynucleotides are considered herein. In one embodiment, theSHM-modified IgG includes a synthetic polynucleotide encoding thehexapeptide sequence Phe121 to Ser126 or shorter segments spanning aregion with significant hydrogen bonding interactions, in which thepolynucleotide has been optimized for SHM. Such Ig molecules aresuitable modulators of dimerization between two FcγRIIa molecules.

The upper portion of the FG loop of FcR has been shown to be involved inIg binding as demonstrated by mutagenesis studies. The FG peptide strandcontains an extended β-sheet which projects the amino acid side chainsin the FG loop in a defined orientation such described in U.S. Pat. No.6,675,105, entitled “3 Dimensional Structure, and Models of Fc Receptorsand Uses Thereof.” Molecules which can act as β-turn mimics and whichpresent their side chains at the top of the FG loop in the same way asthose in the receptor have also been found to be effective in modulatingthe FcR receptor activities.

In one aspect, polynucleotides encoding a PF peptide strand containingan extended β-sheet which projects the amino acid side chains in the FGloop in a defined orientation can be modified to increase susceptibilityto somatic hypermutation using the methods described herein to createmodified polypeptides exhibiting increased binding affinity for thisregion of FcR. In another aspect, polypeptides encoded by suchSHM-modified polynucleotides are considered herein.

Fc receptors play roles in normal immunity and resistance to infectionand provide the humoral immune system with a cellular effector arm.Binding of an Ig gamma (IgG) to FcγR can lead to disease indicationsthat are associated with regulation by FcγR. For example, the autoimmunedisease thrombocytopenia purpura involves tissue (platelet) damageresulting from FcγR-dependent IgG immune complex activation of plateletsor their destruction by FcγR+ phagocytes. In addition, variousinflammatory diseases are known to involve IgG immune complexes (e.g.,rheumatoid arthritis and systemic lupus erythematosus), including typeII and type III hypersensitivity reactions. Type II and type IIIhypersensitivity reactions are mediated by IgG, which can activateeither complement-mediated or phagocytic effector mechanisms, leading totissue damage.

Due to the role of FcRs in a variety of biological mechanisms, there isa need for compounds which affect the binding of immunoglobulins to FcRand which can be used to treat a variety of illnesses associated withregulation by FcRs.

Fc receptor modulators (modulators of Fc receptor binding ofimmunoglobulins) can modulate FcαR, FcεR and FcγR polypeptides.Polynucleotides encoding Fc receptor modulators of FcαR, FcεR and FcγRpolypeptides can be modified to increase and/or decrease susceptibilityof one or more portions of the polypeptide to somatic hypermutationusing the methods described herein. Modified Fc receptor modulators madeusing such methods can be assayed to identify modulators exhibitingmodified binding and activity. In one aspect, the FcR modulatorinteracts with a FcγRI, a FcγRII and/or a FcγRIII. In a further aspect,the FcR modulator interacts with a FcγRIIa, a FcγRIIb and/or a FcγRIIc.Fc receptor modulators provided herein can be used in a variety ofapplications including treatment or diagnosis of any disease whereaggregates of antibodies are produced and where immune complexes areproduced by contact of antibody with intrinsic or extrinsic antigen.Non-limiting treatments and diagnosis applicable by the Fc receptormodulators include immune complex diseases; autoimmune diseasesincluding but not limited to rheumatoid arthritis, systemic lupuserythematosus, immune thrombocytopenia, neutropenia, hemolytic anemias;vasculitities including but not limited to polyarteritis nodosa,systemic vasculitis; xenograft rejection; and infectious diseases whereFcR uptake of virus enhances infection including but not limited toflavivirus infections such as Dengue virus-dengue hemorrhagic fever andmeasles virus infection. The Fc receptor modulators can also be used toreduce IgG-mediated tissue damage and to reduce inflammation.

The SHM-modified Fc receptor modulators presented herein can alsoenhance leukocyte function by enhancing FcR function such as antibodydependent cell mediated cytotoxicity, phagocytosis and release ofinflammatory cytokines. Treatments and diagnosis for enhanced FcRfunction include any infection where normal antibodies are produced toremove the pathogen; and any disease requiring FcR function whereunmodified or recombinant antibodies can be used in treatment such ascancer and infections. For example, an immunoglobulin (e.g., normal Igor SHM-modified Ig) can be administered in combination with a Fcreceptor modulator (e.g., a SHM modified modulator) to enhance theeffect of the immunoglobulin treatment.

In another aspect, FcR are involved in the complement pathway via C1qBinding. C1q and two serine proteases (C1r and C1s) form the complex C1,the first component of the Complement Dependent Cytotoxicity (CDC)pathway. To activate the complement cascade, it is necessary for C1q tobind to at least two molecules of IgG1, IgG2, or IgG3, but only onemolecule of IgM, attached to an antigenic target (Ward and Ghetie,Therapeutic Immunology 2:77-94 (1995) at page 80). Based upon theresults of chemical modifications and crystallographic studies, Burtonet al. (Nature, 288:338-344 (1980)) proposed that the binding site forthe complement subcomponent C1q on IgG involves the last two(C-terminal) β-strands of the C_(H)2 domain. Burton later proposed(Molec. Immunol., 22(3):161-206 (1985)) that the region including aminoacid residues 318 to 337 might be involved in complement fixation. Inone aspect provided herein are SHM-modified Ig molecules (e.g., IgG1,IgG2, or IgG3 or IgM) that have been modified such that they exhibitincreased affinity for the antigenic target, increased binding to C1q,or both. Such modified Ig molecules can be tested in one or moreart-recognized assays to evaluate changes in binding and/or biologicalactivity compared to the starting Ig.

Assays for testing a SHM-modified Fc receptor modulator include thoseknown in the art for testing compounds that modulate Fc receptoractivity such as, for example, binding assays, platelet aggregationinhibition assays, assessment of ADCC activity, assessment of C1qbinding, as well as other assays to test binding and function asdescribed below. Binding and activity of SHM-modified Fc receptormodulators can be compared to control Ig molecules.

Binding Assay

In one example, the interaction between recombinant soluble FcγRIIa andhuman immunoglobulin in the presence of SHM-modified Fc receptormodulators are investigated using a BIAcore 2000 biosensor (PharmaciaBiotech, Uppsala, Sweden) at 22° C. in Hepes buffered saline [HBS: 10 mMHepes (pH 7.4), 150 mM NaCl, 3.4 mM EDTA, 0.005% Surfactant P20(Pharmacia)]. Monomeric human IgG1, IgG3, and IgE (50 μg/mL)(non-specific binding control) are covalently coupled to thecarboxymethylated dextran surface of the CM-5 sensor-chip (BIAcore,Uppsala, Sweden) using the amine coupling protocol (BIAcore, Uppsala,Sweden). An additional channel is chemically treated using the couplingprotocol. Recombinant soluble FcγRIIa is used at a concentration of 125μg/mL, which is equivalent to 50% binding capacity. Recombinant solubleFcγRIIa is preincubated with each of the SHM-modified Fc receptormodulators at room temperature for 30 minutes before being injected overthe sensor-chip surface for 1 minute at 10 μL/min followed by a 3 minutedissociation phase. All surfaces are regenerated with 50 mM diethylamine(about pH 11.5), 1 M NaCl between each of the compounds being analyzed.The maximum response for each interaction is measured. Non-specificbinding responses (IgE channel) are subtracted from binding to IgG1 andIgG3. Measurements are corrected for differences in buffer compositionbetween the SHM-modified Fc receptor modulators and receptor.

Platelet Aggregation Inhibition

Platelet aggregation inhibition can be tested using art-recognizedassays such as described herein. Briefly, the procedure involves addinga test compound to a mixture of the platelets and HAGG compared to acontrol compound. This procedure illustrates the ability of the testcompound to inhibit platelet aggregate formation as well as its abilityto break apart the platelet aggregates which have formed prior to theaddition of the compound compared to controls.

Platelets express a single class of gamma receptors, FcγRIIa. Followingthe cross-linking of FcγRIIa, platelets undergo a variety of biochemicaland cellular modifications that culminate in aggregation. The capacityof the compounds to inhibit platelet activation is measured using anassay that specifically measures platelet aggregation.

Briefly, platelets are isolated as follows: 30 mL of fresh whole bloodis collected into citrated collection vials and centrifuged at 1000 rpmfor ten minutes. The platelet rich plasma is separated and centrifugedat 2000 rpm for five minutes in four tubes. The supernatants are removedand the platelets are gently re-suspended in 2 mL of Tyrodes buffer-pertube (137 mM NaCl, 2.7 mM KCl, 0.36 mM NaH₂PO₄, 0.1% dextrose, 30 mMsodium citrate, 1.0 mM MgCl₂.6H₂O, pH 6.5) and centrifuged again at 2000rpm for five minutes. The supernatants are again removed and plateletsare re-suspensed in 0.5 mL of Hepes containing Tyrodes buffer per tube(137 mM NaCl, 2.7 mM KCl, 0.36 mM NaH₂PO₄, 0.1% dextrose, 5 mM Hepes, 2mM CaCl₂ 1.0 mM MgCl₂.6H₂O, pH 7.35). The platelet count is determinedusing a haematolog analyzer (Coulter) and adjusted to a concentration ofapproximately 100×10⁵ platelets/mL using the Hepes containing Tyrodesbuffer.

For each aggregation experiment, a mixture of 50 μL of a Fc receptoragonist, heat aggregated gamma globulin (“HAGG,” 200 μg/mL) or collagen(2 μg/mL) is incubated with 50 μL of phosphate buffered saline (“PBS”:3.5 mM NaH₂PO₄, 150 mM NaCl) or BRI compound (5 mg/mL in PBS) for 60minutes at room temperature. The assay is then performed using a twocell aggregometer at 37° C. as follows: glass cuvettes are placed inholders and pre-warmed to 37° C. and 400 μL of the platelet suspensionadded. After a stable baseline is reached, 100 μL of HAGG:PBS, HAGG:BRIcompound or collagen:PBS, collagen:BRI compound are added to theplatelet suspension. The subsequent aggregation of the platelets ismonitored for 15 minutes or until aggregation is complete. The rate ofaggregation is determined by measuring the gradient of the aggregationslope.

Assessment of ADCC Activity

To assess ADCC activity of a SHM-modified Fc receptor modulator, an invitro ADCC assay can be performed using varying effector:target ratios.Useful “effector cells” for such assays include peripheral bloodmononuclear cells (PBMC) and Natural Killer (NK) cells. Alternatively,or additionally, ADCC activity of the polypeptide variant can beassessed in vivo, e.g., in a animal model such as that disclosed byClynes et al. PNAS (USA) 95:652-656 (1998).

For example, to prepare chromium 51-labeled target cells, tumor celllines are grown in tissue culture plates and harvested using sterile 10mM EDTA in PBS. SK-BR-3 cells, a 3+ HER2-overexpressing human breastcancer cell line, are used as targets in all assays. The detached cellsare washed twice with cell culture medium. Cells (5×10⁶) are labeledwith 200 μCi of chromium⁵¹ (New England Nuclear/DuPont) at 37° C. forone hour with occasional mixing. Labeled cells are washed three timeswith cell culture medium, then are re-suspended to a concentration of1×10⁵ cells/mL. Cells are used either without opsonization, or areopsonized prior to the assay by Incubation with rhuMAb HER2 wild-type(HERCEPTINT™) or SHM-modified FcR modulators in PBMC assay or in NKassay.

Peripheral blood mononuclear cells are prepared by collecting blood onheparin from normal healthy donors and dilution with an equal volume ofphosphate buffered saline (PBS). The blood is then layered overLYMPHOCYTE SEPARATION MEDIUM™ (LSM: Organon Teknika) and centrifugedaccording to the manufacturer's instructions. Mononuclear cells arecollected from the LSM-plasma interface and are washed three times withPBS. Effector cells are suspended in cell culture medium to a finalconcentration of 1×10⁷ cells/mL.

After purification through LSM, natural killer (NK) cells are isolatedfrom PBMCs by negative selection using an NK cell isolation kit and amagnetic column (Miltenyi Biotech) according to the manufacturer'sinstructions. Isolated NK cells are collected, washed and re-suspendedin cell culture medium to a concentration of 2×10⁶ cells/mL. Theidentity of the NK cells is confirmed by flow cytometric analysis.

Varying effector:target ratios are prepared by serially diluting theeffector (either PBMC or NK) cells two-fold along the rows of amicroliter plate (100 μL final volume) in cell culture medium. Theconcentration of effector cells ranges from 1.0×10⁷/mL to 2.0×10⁴/mL forPBMC and from 2.0×10⁶/mL to 3.9×10³/mL for NK. After titration ofeffector cells, 100 μL of chromium 51-labeled target cells (opsonized ornon-oponsonized) at 1×10⁵ cells/mL are added to each well of the plate.This results in an initial effector:target ratio of 100:1 for PBMC and20:1 for NK cells. All assays are run in duplicate, and each platecontains controls for both spontaneous lysis (no effector cells) andtotal lysis (target cells plus 100 μL) 1% sodium dodecyl sulfate, 1 Nsodium hydroxide). The plates are incubated at 37° C. for 18 hours,after which the cell culture supernatants are harvested using asupernatant collection system (Skatron Instrument, Inc.) and counted ina Minaxi auto-gamma 5000 series gamma counter (Packard) for one minute.Results are then expressed as percent cytotoxicity using the formula: %Cytotoxicity=(sample cpm-spontaneous lysis)/(total lysis-spontaneouslysis)×100 Four-parameter curve-fitting is then used to evaluate thedata (KaleidaGraph 3.0.5).

C1q Binding

The ability of the variant to bind C1q and mediate complement dependentcytotoxicity (CDC) can be assessed. To determine C1q binding, a C1qbinding ELISA can be performed. Briefly, assay plates are coatedovernight at 4° C. with SHM modified Fc receptor modulator or controlpolypeptide in coating buffer. The plates are then be washed andblocked. Following washing, an aliquot of human C1q is added to eachwell and incubated for 2 hrs at room temperature. Following a furtherwash, 100 μl of a sheep anti-complement C1q peroxidase conjugatedantibody is added to each well and incubated for 1 hour at roomtemperature. The plate is then washed with wash buffer and 100 μl ofsubstrate buffer containing OPD (O-phenylenediamine dihydrochloride(Sigma)) is added to each well. The oxidation reaction, observed by theappearance of a yellow color, is allowed to proceed for 30 minutes andstopped by the addition of 100 μl of 4.5 NH₂SO₄. The absorbance is thenread at (492-405) nm.

Binding and Biological Activity

The SHM modified Fc receptor modulator can then be subjected to one ormore assays to evaluate any change in binding and biological activitycompared to the starting polypeptide.

In one example, the SHM modified Fc receptor modulator essentiallyretains the ability to bind receptor compared to the non-modifiedpolypeptide, i.e. the binding capability is no worse than about 20 fold,e.g. no worse than about 5 fold of that of the non-modified polypeptide.The binding capability of the SHM modified Fc receptor modulator isdetermined using techniques such as fluorescence activated cell sorting(FACS) analysis or radioimmunoprecipitation (RIA), for example.

To determine receptor binding, a polypeptide comprising at least thebinding domain of the receptor of interest (e.g. the extracellulardomain of an α subunit of an FcR) is coated on solid phase, such as anassay plate. The binding domain of the receptor alone or areceptor-fusion protein is coated on the plate using standardprocedures. Examples of receptor-fusion proteins includereceptor-glutathione S-transferase (GST) fusion protein, receptor-chitinbinding domain fusion protein, receptor-hexaHis tag fusion protein(coated on glutathione, chitin, and nickel coated plates, respectively).Alternatively, a capture molecule is coated on the assay plate and usedto bind the receptor-fusion protein via the non-receptor portion of thefusion protein. Examples include anti-hexaHis F(ab′)₂ coated on theassay plate used to capture receptor-hexaHis tail fusion or anti-GSTantibody coated on the assay plate used to capture a receptor-GSTfusion. In other embodiments, binding to cells expressing at least thebinding domain of the receptor is evaluated. The cells can be naturallyoccurring hematopoietic cells that express the FcR of interest or can betransformed with nucleic acid encoding the FcR or a binding domainthereof such that the binding domain is expressed at the surface of thecell to be tested.

The immune complex described herein above is added to thereceptor-coated plates and incubated for a sufficient period of timesuch that the polypeptide binds to the receptor. Plates are then washedto remove unbound complexes, and binding of the analyte is detectedaccording to known methods. For example, binding is detected using areagent (e.g. an antibody or fragment thereof which binds specificallyto the analyte, and which is optionally conjugated with a detectablelabel—detectable labels and methods for conjugating them to polypeptidesare described below in the section entitled “Non-Therapeutic Uses forthe Polypeptide Variant”).

Low Affinity Receptor Binding Assay

Binding of an IgG Fc region to recombinant FcγRIIA, FcγRIIB and FcγRIIIAα subunits expressed as His6-glutathione S transferase (GST)-taggedfusion proteins can be determined. Since the affinity of the Fc regionof IgG1 for the FcγRI is in the nanomolar range, the binding ofSHM-modified IgG1 Fc can be measured by titrating monomeric IgG andmeasuring bound IgG with a polyclonal anti-IgG in a standard ELISAformat. The affinity of the other members of the FcγR family, i.e.FcγRIIA, FcγRIIB and FcγRIIIA for IgG, is however in the micromolarrange and binding of monomeric IgG1 for these receptors can not bereliably measured in an ELISA format.

FcγR Binding ELISAs

FcγRI α subunit-GST fusion is coated onto Nunc F96 maxisorb plates (catno. 439454) by adding 100 μl of receptor-GST fusion at 1 μg/ml in PBSand incubated for 48 hours at 4° C. Prior to assay, plates are washed 3×with 250 μl of wash buffer (PBS pH 7.4 containing 0.5% TWEEN 20) andblocked with 250 μl of assay buffer (50 mM Tris buffered saline, 0.05%TWEEN 20, 0.5% RIA grade bovine albumin (Sigma A7888), and 2 mM EDTA pH7.4). Samples diluted to 10 μg/ml in 1 ml of assay buffer are added toFcγRI α subunit coated plates and incubated for 120 minutes at 25° C. onan orbital shaker. Plates are washed 5× with wash buffer to removeunbound complexes and IgG binding is detected by adding 100 μl horseradish peroxidase (HRP) conjugated goat anti-human IgG heavy chainspecific (Boehringer Mannheim 1814249) at 1:10,000 in assay buffer andincubated for 90 min at 25° C. on an orbital shaker. Plates are washed5× with wash buffer to remove unbound HRP goat anti-human IgG and boundanti-IgG is detected by adding 100 μl of substrate solution (0.4 mg/mlo-phenylenedaimine dihydrochloride, Sigma P6912, 6 mM H₂O₂ in PBS) andincubating for 8 min at 25° C. Enzymatic reaction is stopped by theaddition of 100 μl 4.5N NH₂SO₄ and calorimetric product is measured at490 nm on a 96 well plate densitometer (Molecular Devices). Binding ofSHM-modified FcR modulators is expressed as a percent of the parentmolecule (i.e., wild-type or previously modified).

FcRn Binding ELISA

For measuring FcRn binding activity of IgG variants, ELISA plates arecoated with 2 μg/ml streptavidin (Zymed, South San Francisco) in 50 mMcarbonate buffer, pH 9.6, at 4° C. overnight and blocked with PBS-0.5%BSA, pH 7.2 at room temperature for one hour. Biotinylated FcRn(prepared using biotin-X-NHS from Research Organics, Cleveland, Ohio andused at 1-2 μg/ml) in PBS-0.5% BSA, 0.05% polysorbate 20, pH 7.2, isadded to the plate and incubated for one hour. Two fold serial dilutionsof IgG standard (1.6-100 ng/ml) or variants in PBS-0.5% BSA, 0.05%polysorbate 20, pH 6.0, are added to the plate and incubated for twohours. Bound IgG is detected using peroxidase labeled goat F(ab′)₂anti-human IgG F(ab′)₂ in the above pH 6.0 buffer (JacksonImmunoResearch, West Grove, Pa.) followed by 3,3′,5,5′-tetramethylbenzidine (Kirgaard & Perry Laboratories) as the substrate. Plates arewashed between steps with PBS-0.05% polysorbate 20 at either pH 7.2 or6.0. Absorbance is read at 450 nm in a Vmax plate reader (MolecularDevices, Menlo Park, Calif.). Titration curves are fit with afour-parameter nonlinear regression curve-fitting program (KaleidaGraph,Synergy software, Reading, Pa.). Concentrations of IgG variantscorresponding to the mid-point-absorbance of the titration curve of thestandard are calculated and then divided by the concentration of thestandard corresponding to the mid-point absorbance of the standardtitration curve.

Example 8 Use of AID for Enzymatic Pathway Optimization

Using the SHM systems described herein, polynucleotides encoding one ormore enzymes can be simultaneously modified via somatic hypermutation toincrease the speed or efficiency of metabolic conversions. Enzymesinvolved in pathways of interest include, for example, those associatedwith yeast fermentation, antibiotics and clean-up of oil spills (see,for example, Ho, .N. W. Y., Chen Z. and A. Brainard. 1998. Appl. andEnviron. Microbiol. 64:1852-1859; and Sonderegger M. and U. Sauer. 2003.Appl. And Environ. Microbiol. 69:1990-1998).

In one aspect, to develop a commercially viable yeast fermentationsystem for converting plant-based cellulosic biomass to ethanol,Saccharomyces cerevisiae that have been genetically engineered to usexylose as a substrate can be further modified by step-wise induction ofsomatic hypermutation followed by selection for the ability to growanaerobically using xylose as the sole carbon source present in thegrowth medium.

One advantage of ethanol is that it is a non-fossil biofuel thatproduces less pollution than gasoline. Another advantage is that ethanolcan be produced from readily available, renewable cellulosic biomassfrom plant material. Cellulosic biomass is composed of pentose sugars(mainly glucose and xylose). A major obstacle to developing a commercialprocess for converting cellulosic biomass to ethanol, however, is thatSaccharomyces species of yeast (the microorganisms currently used forlarge scale industrial production of ethanol from glucose) are currentlyunable to ferment ethanol from xylose with high yields and specificrates.

Metabolic engineering has been used successfully to develop strains ofSaccharomyces cerevisiae that can ferment both xylose and arabinose toethanol. However, none of the recombinant strains or any othernaturally-occurring yeast strains have been able to grow anaerobicallyon xylose alone.

Provided herein is a method of somatic hypermutation of one or more ofthe genes of the xylose utilization pathway to create recombinantstrains of anaerobic xylose-utilizing eukaryotes such as Saccharomycescerevisiae that can grow anaerobically on xylose alone and fermentxylose and arabinose to ethanol.

In one aspect, step-wise somatic hypermutation can be used to develop ayeast strain that is capable of anaerobic growth on xylose alone. In onenon-limiting embodiment, a strain of Saccharomyces cerevisiae thatover-expresses the three enzymes from Pichia stipitis that actsequentially to convert xylose to xylulose 5-phosphate (a substrate thatSaccharomyces species are able to ferment). The resultingxylose-utilizing yeast strain has utility for significantly improvingethanol fermentations and commercially viable ethanol production fromplant-based cellulosic biomass.

Briefly, an inducible activation-induced cytidine deaminase (AID)SHM-resistant polynucleotide sequence is introduced into the startingyeast strain by stable chromosomal insertion and then step-wise somatichypermutation by AID is induced with monitoring of culture growth andxylose utilization between steps. The growth rate of the culture ismonitored by measuring optical density at 600 nanometers. Xyloseutilization is monitored using a commercially available enzymatic kit(Medichem, Steinenbronn, Germany) to measure xylose in the culturemedium.

The next step is initiated when the culture growth rate and xyloseutilization increase, for example, about 2-fold. An aerobic chemostatculture containing 5 grams xylose per liter and 1 gram glucose per literis prepared. AID expression is induced for about 10, about, 15, or about20 generations (i.e., approximately 6-12 days). The number ofgenerations for induction of AID expression can be determined based onreversion data, DT40 screening data (Cumbers et al. Nat. Biotechnol.2002 November; 20(11): 1129-1134)) and the yeast growth rate of 1.73generations/day. At the point where the culture growth rate increasesand the xylose is consumed, a culture aliquot is withdrawn and a newaerobic chemostat culture containing 5 grams per liter xylose as thesole carbon source is inoculated. AID expression is induced in thisculture for about 10, about, 15, or about 20 generations. Again, theculture is monitored for growth rate and xylose concentration. When agrowing population that consumes the xylose is obtained, the aerationrate is dropped to less than 1 milliliter per minute and AID expressionis again induced for about 10, about, 15, or about 20 generations. Whenthe growth rate of the culture stabilizes, a culture aliquot iswithdrawn and a new aerobic chemostat culture containing 5 grams perliter xylose as the sole carbon source is inoculated and grown in theabsence of any aeration and with induction of AID expression for about10, about, 15, or about 20 generations. The culture growth rate andxylose utilization are monitored.

When the growth rate of the culture starts to increase and the xyloseconcentration decreases, a culture aliquot is withdrawn and a strictanaerobic batch culture in a 17 milliliter Pyrex glass tube sealed withbutyl rubber septa and plastic screw cap is inoculated with xylose asthe sole carbon source. AID expression is induced for about 10, about,15, or about 20 generations and the culture is monitored for growth rateand xylose utilization. When the culture growth rate increases and thexylose concentration decreases an aliquot of the population is plated onanaerobic minimal medium agar plates containing 20 grams per literxylose as the sole carbon source. The plates are incubated at 30°Celsius in sealed jars using, for example, the GasPack Plus System(Becton Dickinson) to provide an anaerobic atmosphere. The anaerobicatmosphere is monitored using indicator strips (Becton Dickinson).

The fermentation performance of the parental strain along with theevolved populations, and 15 clones isolated from the anaerobic xyloseplates are compared in anaerobic batch cultures with 50 grams per literof glucose and 50 grains of xylose per liter. The growth rate, xyloseand glucose utilization of the cultures are monitored. Glucoseutilization is determined using a commercial kit (Beckman).

The last step of SHM is to take the clone that has the best growth andxylose utilization characteristics, induce AID expression, and grow inmultiple serial batch cultures for about 15 generations. Twenty clonesisolated from this culture are then grown on xylose as the sole carbonsource in strictly anaerobic culture conditions. The performance ofclones with the fastest growth rates are further evaluated in anaerobicxylose batch cultures.

Example 9 Application of SHM for Affinity Maturation of Antibodies

As described previously, antibodies provide an unmodified templatethrough which SHM can be applied to create mutant proteins with enhancedproperties. Such improved antibodies can be selected based upon affinityselection, for example via FACS or via binding to magnetic beads.

Antibodies directed towards hen egg lysozyme (HyHel) represent anextremely well characterized system that enables the testing andoptimization of the mutation and selection systems of the presentinvention. Specifically the HyHel antibodies enable the testing of anumber of highly related antibodies that exhibit a well defined range ofaffinities and have characterized sequences and binding properties. Forexample, the following antibodies, and sequence variants thereof(muteins) have fully defined sequences starting from the germlinesequence to the fully affinity mature antibody, see, e.g., Pons et al.,(1999) Protein Science 8:958-68; and Smith-Gill et al., (1984) J.Immunology 132:963.

TABLE 16 Hen Egg Lysozyme antibody constructs (HyHEL) ConstructPublished Light Chain Name Affinity Heavy Chain Identity IdentityHyHEL10 0.03 nM Heavy Chain (“wild type”) Light Chain (“wild type”)Mutein 1 66 nM Heavy Chain (“wild type”) Light Chain (Y50A) Mutein 2 167nM Heavy Chain (“wild type”) Light Chain (N32A) Mutein 3 460 nM HeavyChain (“wild type”) Light Chain (N31E) Mutein 4 800 nM Heavy Chain(Y33A) Light Chain (“wild type”) Mutein 5 7,000 nM Heavy Chain (Y50A)Light Chain (“wild type”) Germline unknown (Germline sequence) (Germlinesequence)

To further optimize the system of the present invention, a range of highand low affinity HyHel constructs are made and cloned into theexpression vectors of the present invention. These vectors are thentransfected into mutator cells expressing AID and selected usingmagnetic beads. Wild type HyHEL constructs are used as positive controlsto optimize binding conditions and validate assay methodology.

A. Synthesis and Cloning of (“Wild Type”) HyHEL10 Heavy and Light ChainConstructs.

The prototypic HyHEL10 heavy chain and light chain expression vectorsare created starting from the expression vector AB102, (as describedpreviously), using standard molecular genetic manipulations as follows:

The puromycin resistance marker in AB102 is replaced with cold bsd orcold hyg using the NgoMIV and XbaI restriction sites, to generate thevectors AB187 and AB185, respectively.

A slightly longer, transcriptionally more robust version of the CMVpromoter is exchanged for the original sequence found in AB102 usingNheI (the mcs2 restriction site most proximal to the CMV promoter) andSbfI (the most CMV-proximal mcs1 site). The original AB102 CMV promoterincluded 553 bp of the unmodified CMV sequence upstream from the first Tof the TATA box, while the AB187 version includes 645 bp upstream fromthe first T of the TATA box.

The nucleotide sequences for the “wild type” HyHEL heavy and lightchains (as disclosed above) are synthesized at DNA 2.0, (Menlo Park,Calif.). For cloning purposes, the heavy chain is bordered by BglII andAscI, and the light chain is bounded by restriction sites SacI and AscI.

In order to express HyHEL10 IgG and its muteins on the cell surface, theheavy chain is created as a chimeric molecule with the followingfeatures:

Kozak consensus sequence; HyHEL10 heavy chain variable region;full-length murine IgG1 constant region; XhoI site; Murine H2kk (MHCtype I) peri-transmembrane domain, transmembrane domain and cytoplasmicdomain. The H2kk sequences is determined from accession number AK153419at the National Center for Biotechnology Information (NCBI) nucleotidedatabase.

The nucleotide sequence of the full length chimeric, cell-surfaceassociated HyHEL10 heavy chain is as listed below:

In this sequence, the BglII site is underlined; Kozak sequence isunderlined and italicized; stop codon is underlined and bolded; XhoIsite is indicated by boxed nucleotides; Double underlined sequences arederived from H2kk. The AscI cloning site 3′ to the TGA stop codon isindicated by italicized nucleotides.

(SEQ ID NO: 291)AGATCTGCTTGAATCCGCGGATAAGAGGACTAGTATTCGTCTCACTAGGGAGAGCTC ACCACC ATGAACAAGTTGCTGTGCTGCGCGCTCGTGTTTCTGGACATCTCCATTAAGTGGACCACCCAGGACGTGCAGCTTCAGGAGTCAGGACCTAGCCTCGTGAAACCTTCTCAGACTCTGTCCCTCACCTGTTCTGTCACTGGCGACTCCATCACCAGTGATTACTGGAGCTGGATCCGGAAATTCCCAGGGAATAGACTTGAGTACATGGGGTACGTAAGCTACAGTGGTAGCACTTACTACAATCCATCTCTCAAAAGTCGAATCTCCATCACCCGAGACACATCCAAGAACCAGTACTACCTGGATTTGAATTTCTGTGACTACTGAGGACACAGCCACATATTACTGTGCAAACTGGGACGGTGATTACTGGGGCCAAGGGACTCTGGTCACTGTCTCTGCAGCCAAAACGACACCCCCATCTGTCTATCCACTGGCCCCTGGATCTGCTGCCCAAACTAACTCCATGGTGACCCTGGGATGCCTGGTCAAGGGCTATTTCCCTGAGCCAGTGACAGTGACCTGGAACTCTGGATCCCTGTCCAGCGGTGTGCACACCTTCCCAGCTGTCCTGCAGTCTGACCTCTACACTCTGAGCAGCTCAGTGACTGTCCCCTCCAGCCCTCGGCCCAGCGAGACCGTCACCTGCAACGTTGCCCACCCGGCCAGCAGCACCAAGGTGGACAAGAAAATTGTGCCCAGGGATTGTGGTTGTAAGCCTTGCATATGTACAGTCCCAGAAGTATCATCTGTCTTCATCTTCCCCCCAAAGCCCAAGGATGTGCTCACCATTACTCTGACTCCTAAGGTCACGTGTGTTGTGGTAGACATCAGCAAGGATGATCCCGAGGTCCAGTTCAGCTGGTTTGTAGATGATGTGGAGGTGCACACAGCTCAGACGCAACCCCGGGAGGAGCAGTTCAACAGCACTTTCCGCTCAGTCAGTGAACTTCCCATCATGCACCAGGACTGGCTCAATGGCAAGGAGTTCAAATGCAGGGTCAACAGTGCAGGTTCCCCTGCCCCCATCGAGAAAACCATCTCCAAAACCAAAGGCAGACCGAAGGCTCCACAGGTGTACACCATTCCACCTCCCAAGGAGCAGATGGCCAAGGATAAAGTCAGTCTGACCTGCATGATAACAGACTTCTTCCCTGAAGACATTACTGTGGAGTGGCAGTGGAATGGGCAGCCAGCGGAGAACTACAAGAACACTCAGCCCATCATGAACACGAATGGCTCTTACTTCGTCTACAGCAAGCTCAATGTGCAGAAGAGCAACTGGGAGGCAGGAAATACTTTCACCTGCTCTGTGTTACATGAGGGCCTGCACAACCACCATACTGAGAAGAGCCTCTCCCACTC

TGGAGCTGCAATAGTCACTGGAGCTGTGGTGGCTTTTGTGATGAAGATGAGAAGGAGAAACACAGGTGGAAAAGGAGGGGACTATGCTCTGGCTCCAGGCTCCCAGACCTCTGATCTGTCTCTCCCAGATTGTAAAGTGATGGTTCATGACCCTCATTCTCTAGCG TGA GGCCGGCCAAGGCGCGCC.

The amino acid sequence of the chimeric, cell-surface associated HyHEL10heavy chain is as listed below. The 2 amino acids (leu-glu) encoded bythe synthetic XhoI site are marked by bold-and-underline; thebold-underline Glu also represents the most amino proximal amino acid ofthe H2kk domain; double underline indicates the putative transmembranedomain; asterisk indicates stop codon.

(SEQ ID NO: 292)MNKLLCCALVFLDISIKWTTQDVQLQESGPSLVKPSQTLSLTCSVTGDSITSDYWSWIRKFPGNRLEYMGYVSYSGSTYYNPSLKSRISITRDTSKNQYYLDLNSVTTEDTATYYCANWDGDYWGQGTLVTVSAAKTTPPSVYPLAPGSAAQTNSMVTLGCLVKGYFPEPVTVTWNSGSLSSGVHTFPAVLQSDLYTLSSSVTVPSSPRPSETVTCNVAHPASSTKVDKKIVPRDCGCKPCICTVPEVSSVFIFPPKPKDVLTITLTPKVTCVVVDISKDDPEVQFSWFVDDVEVHTAQTQPREEQFNSTFRSVSELPIMHQDWLNGKEFKCRVNSAAFPAPIEKTISKTKGRPKAPQVYTIPPPKEQMAKDKVSLTCMITDFFPEDITVEWQWNGQPAENYKNTQPIMNTNGSYFVYSKLNVQKSNWEAGNTFTCSVLHEGLHNHHTEKSLSHSPGK LE PPPSTVSNMATVAVLVVLGAAIVTGAVVAFVMKMRRRNTGGKGGDYALAPGSQTSDLSLPDCKVMVHDPHSLA*.

The amino acid and nucleotide sequence of the (“wild type”) HyHEL kappalight chain.

Amino acid sequence of the HyHEL kappa light chain. Asterisk indicatesstop codon.

(SEQ ID NO: 338)MNKLLCCALVFLDISIKWTTQDIVLTQSPATLSVTPGNSVSLSCRASQSIGNNLHWYQQKSHESPRLLIKYASQSISGIPSRFSGSGSGTDFTLSINSVETEDFGMYFCQQSNSWPYTFGGGTKLEIKRADAAPTVSIFPPSSEQLTSGGASVVCFLNNFYPKDINVKWKIDGSERQNGVLNSWTDQDSKDSTYSMSSTLTLTKDEYERHNSYTCEATHKTSTSPIVKSFNRNEC*.

The nucleotide sequence of the HyHEL kappa light chain. Start and stopcodons are underlined. SacI and AscI cloning sites are bolded.

(SEQ ID NO: 293)GAGCTCACCACAATGAACAAGTTGCTGTGCTGCGCGCTCGTGTTTCTGGACATCTCCATTAAGTGGACCACCCAGGATATTGTGCTAACTCAGTCTCCAGCCACCCTGTCTGTGACTCCAGGAAATAGCGTCAGTCTITCCTGCAGGGCCAGCCAAAGTATTGGCAACAACCTACACTGGTATCAACAAAAATCACATGAGTCTCCAAGGCTTCTCATCAAGTATGCTTCCCAGTCCATCTCTGGGATCCCCTCCAGGTTCAGTGGCAGTGGATCAGGGACAGATTTCACTCTCAGTATCAACAGTGTGGAGACTGAAGATTTTGGAATGTATTTCTGTCAACAGAGTAACAGCTGGCCTTACACGTTCGGAGGGGGGACCAAGCTGGAAATAAAACGGGCTGATGCTGCACCAACTGTATCCATCTTCCCACCATCCAGTGAGCAGTTAACATCTGGAGGTGCCTCAGTCGTGTGCTTCTTGAACAACTTCTACCCCAAAGACATCAATGTCAAGTGGAAGATTGATGGCAGTGAACGACAAAATGGCGTCCTGAACAGTTGGACTGATCAGGACAGCAAAGACAGCACCTACAGCATGAGCAGCACCCTCACGTTGACCAAGGACGAGTATGAACGACATAACAGCTATACCTGTGAGGCCACTCACAAGACATCAACTTCACCCATTGTCAAGAGCTTCAACAGGAATGAGTGTTGA GGCGCGCC.

Muteins of the “wild type” heavy and light chains, as well as thegermline sequence, as described below in Table 17, were created usingsite directed mutagenesis and their sequence confirmed by sequencing.

TABLE 17 Hen Egg Lysozyme antibody constructs with measured affinitiesMutations DNA Sequence Kd koff kon wt LC/wt HC GGC30-AAC31-AAC32-CTA333.93E−11 8.6E−05 2.2E+06 Light chain variants LC G30(silent)N31A/GGA30-GCT31-AAC32-CTA33 1.48E−09 8.29E−03 5.61E+06 wt HC N31G LC/wt HCGGC30-GGT31-AAC32-CTA33 2.78E−09 1.21E−02 4.33E+06 N31S LC/wt HCGGC30-AGC31-AAC32-CTA33 7.10E−10 9.70E−04 1.40E+06 N32S LC/wt HCGGC30-AAC31-AGC32-CTA33 1.00E−10 1.90E−04 1.90E+06 N32G LC/wt HCGGC30-AAC31-GGT32-CTA33 6.29E−10 2.85E−03 4.53E+06 N31SN32S/wt HCGGC30-AGC31-AGC32-CTA33 2.50E−09 6.10E−03 2.40E+06 LC L33(silent)/wt HCGGC30-AAC31-AAC32-TTA33 5.96E−11 9.33E−05 1.56E+06 N31D LC/wt HCGGC30-GAT31-AAC32-CTA33 1.1E−10 Heavy chain variants wt LC/Y50A HCGGG49-GCC50-GTA51 Not detectable wt LC/Y33A HC GAT32-GCC33-TGG34 2.0E−084.45E−02 2.13E+06 Mixed heavy and light chain variants LC N31G/Y33A HCsee above 7.0E−06 LC N32G/Y33A HC see above 2.00E−08

Nucleotides in bold represent codons in which defined mutations weremade to introduce SHM optimized codons to increase somatic hypermutationcompared to the “wild type” (HyHEL10) sequence (“wt”), as defined below.LC=Light Chain; HC=Heavy Chain. Also shown are the measured affinitiesof each mutant, obtained via BIACORE analysis.

These positions have been previously shown to be important for bindingand to have been naturally mutated from the corresponding germlinesequence during somatic hypermutation. Specifically, the light chainsequence of HyHEL10 contains the residue Asn31 located within CDR1 thatmakes a thermodynamically important contact to the HEL antigen residueLys96. The Gly31 mutant (codon GGT) of HyHEL10 has a measureddissociation constant of around 2.5 nM, whereas the Asp31 (codon GAT)mutant of HyHEL10 has a measured dissociation constant of around 110 pM,and the wild-type Asn31 (codon AAC) of HyHEL10 has a measureddissociation constant of around 30 pM.

B. Transfection of Cells

Hek 293 cells are plated at 4×10⁵/well, in 6-well microtiter dish. After24 hrs., transfections are performed using Fugene6 reagent from RocheApplied Sciences (Indianapolis, Ind.) at a reagent-to-DNA ratio of 3μL:1 μg DNA per well with the expression vectors AB187 and AB185 thatcomprised the HyHEL heavy and light chains and conferred blasticidin andhygromycin resistance respectively. Transfections are carried out inaccordance with manufacturer's protocol.

C. Selection of Peptides

An unlabeled and biotinylated monomeric peptide sequence that comprisesthe majority of the hen egg lysozyme (HEL) binding surface issynthesized. Two dimeric peptide sequences are also synthesized tocompare whether presenting the peptide as a dimer would enhance antibodybinding by increasing the avidity of the antibody-peptide interaction. Atandem dimer and a branched multiple antigenic peptide (MAP) dimer arealso tested. Peptides as well as biotinylatd or unlabeled HEL proteinare coupled to paramagnetic polystyrene microparticle surfaces that hadbeen modified with functional groups or coated with streptavidin(Invitrogen, 1600 Faraday Ave., PO Box 6482, Carlsbad, Calif. 92008).

D. Coupling HEL Protein and Peptides to Tosylactivated Microparticles

The HEL protein and peptides are coupled to 2.8 micron Tosylactivatedparamagnetic polystyrene microparticles in a 1.5 ml microcentrifuge tube(Nilsson K and Mosbach K. “p-Toluenesulfonyl chloride as an activatingagent of agarose for the preparation of immobilized affinity ligands andproteins.” Eur. J. Biochem. 1980:112: 397-402). The microparticles (2e09microparticles/milliliter) are washed and re-suspended in 100 mM boratebuffer, pH 9.5 at a concentration of 1e09 microparticles/ml. Elevennanomoles of peptide or 6 ug/ml HEL are added to the microparticles andthe microparticle/peptide mixture is incubated at room temperature forat least 48 hours with slow tilt rotation. After incubation, thesupernatant is removed and the microparticles are washed with 1 mlphosphate buffered saline solution (PBS), pH 7.2 containing 1%(weight/volume) BSA. Finally, the microparticles are re-suspended in 1ml PBS solution, pH 7.2 containing 1% (weight/volume) BSA.

E. Coupling Biotinylated HEL Protein and Peptides toStreptavidin-Conjugated Microparticles

Another option is to couple biotinylated peptides to paramagneticpolystyrene microparticles whose surfaces have been covalently linkedwith a monolayer of streptavidin. Briefly, the streptavidinmicroparticles are washed, re-suspended in 1 ml PBS solution, pH 7.2containing 1% (weight/volume) BSA and 33 picomoles of biotinylatedpeptide or approximately 10 μg/ml biotinylated HEL are then added to themicroparticle solution. The microparticle/peptide solution is incubatedfor 30 minutes at room temperature with slow tilt rotation. Aftercoupling, the microparticles are washed and re-suspended to a finalmicroparticle concentration of 1e09 microparticles/ml. (Argarana C E,Kuntz I D, Birken S, Axel R, Cantor C R. Molecular cloning andnucleotide sequence of the streptavidin gene. Nucleic Acids Res. 1986;14(4):1871-82; Pahler A, Hendrickson W A, Gawinowicz Kolks M A, AraganaC E, Cantor C R. Characterization and crystallization of corestreptavidin. J Biol Chem 1987:262(29):13933-7).

F. Cell Selection

Transfected HEK 293 cells expressing the 30 pM and 800 nM affinity HyHELantibody heavy and light chains are screened in order to isolate cellsthat bind to the peptide-conjugated paramagnetic microparticles. Asimilar control cell line that did not express antibody is used as anegative control for the selections.

The cells are washed with an equal volume of PBS solution, pH 7.2 andre-suspended in PBS solution, pH 7.2 containing 1% (weight/volume) BSAto a final cell concentration of 1e07 cells/ml. The cells arepre-cleared by adding 1e06 naked microparticles to the cells andincubating on a rotator at 4° C. for 30 minutes. The unbound cells aregently transferred to a new tube. Peptide-conjugated or nakedmicroparticles (1e07) are transferred into the tube with the cells andthe cell:microparticle mixture is incubated on a rotator at 4° C. for 30minutes. The unbound cells are then removed and the microparticle:cellmixture is washed with cold PBS/1% BSA. The microparticles and attachedcells are re-suspended in 100 μl cell culture medium and grown initiallyin one well of a 96-well plate. The number of microparticle-bound cellsis determined and the cells are expanded until the next round ofselection. The number of microparticle-bound cells selected on thepeptide-conjugated microparticles is compared with cells bound to thenaked microparticles and to the cells that do not express antibody.

FIG. 48A shows cells expressing the 30 pM HyHEL antibody (dark gray) orno antibody (light gray) after selection by incubating with streptavidinmicroparticles conjugated to the mature HEL protein (Protein), HELpeptide monomer (Monomer), tandem HEL dimer (Tandem), HEL MAPS dimer(MAPS) or naked unconjugated streptavidin microparticles (Naked). Thewhole HEL protein- and HEL peptide monomer-conjugated microparticles areeffective in isolating cells expressing the 30 pM HyHEL antibody in thisexperiment.

In FIG. 48B, cells expressing the 800 pM HyHEL antibody (dark gray) orno antibody (light gray) are selected by incubating with tosylactivatedmicroparticles conjugated to either the mature HEL protein (Protein) ornaked unconjugated tosylactivated microparticles (Naked). The whole HELprotein-conjugated microparticles are effective in isolating cellsexpressing the 800 pM HyHEL antibody in this experiment.

In FIG. 49A, cells expressing the 30 pM HyHEL antibody are selected byincubating with streptavidin microparticles conjugated to the mature HELprotein (Protein), HEL peptide monomer (Monomer), tandem HEL dimer(Tandem), HEL MAPS dimer (MAPS) or naked unconjugated streptavidinmicroparticles (Naked). The whole HEL protein- and HEL peptidemonomer-conjugated microparticles are effective in isolating cellsexpressing the 30 pM HyHEL antibody in this experiment.

In FIG. 49B, cells expressing the 800 pM HyHEL antibody are selected byincubating with tosylactivated microparticles conjugated to either themature HEL protein (Protein) or naked unconjugated tosylactivatedmicroparticles (Naked). The whole HEL protein-conjugated microparticlesare effective in isolating cells expressing the 800 pM HyHEL antibody inthis experiment.

G. In Vitro Affinity Maturation

A clonal population of HyHEL 10 Gly31 (GGT) mutants (presented on thesurface of HEK293 cells) was subjected to iterative rounds of FACS basedselection against 50 pM FITC-HEL in the presence of SHM as describedbelow to determine how effectively somatic hypermutation could restorethe affinity of a relatively weakly binding mutant.

1. Transfection of Cells

A stable HEK-293 cell line expressing the [N31G LC/wt HC] anti-HELimmunoglobulin and AID activity was generated by seeding a T75 cultureflask with 3×10⁶ HEK-293 cells in 10 mL DMEM medium containing 10% FBS(Invitrogen Corporation, Carlsbad, Calif.). The following day, 500 μLOptiMEM (Invitrogen Corporation, Carlsbad, Calif.), 20 μL HD-Fugene(Roche Diagnostics Corporation, Indianapolis, Ind.), 1 μg of theoptimized AID expression vector (Example 5), and 1.5 μg each of theheavy and light chain expression vectors were mixed and incubated forapproximately 25-30 minutes at room temperature. After incubation, thismixture was added drop-wise to the cell culture medium.

Approximately three days post-transfection, the cell growth medium wasexchanged with 10 mL DMEM medium containing 10% FBS, 50 μg/mL Geneticin,10 μL/mL Antibiotic-Antimycotic Solution, 1.5 μg/mL puromycin, 15 μg/mLblasticidin, and 350 μg/mL hygromycin (Invitrogen Corporation, Carlsbad,Calif.) and the cells were incubated for approximately four weeks withperiodic re-seeding and exchange of the cell culture medium. At the endof the selection period, the cell culture was expanded, archived and aT75 cell culture flask was seeded with 3×10⁶ HEK-293 cells thatexpressed the [N31G LC/wt HC] anti-HEL immunoglobulin and AID activityin 10 mL DMEM medium containing 10% FBS (Invitrogen Corporation,Carlsbad, Calif.). The following day, 500 μL OptiMEM (InvitrogenCorporation, Carlsbad, Calif.), 20 μL HD-Fugene (Roche DiagnosticsCorporation, Indianapolis, Ind.), and 3 μg of the AID expression vectorDNA described above were mixed and incubated for approximately 25-30minutes at room temperature. After incubation, this mixture was addeddrop-wise to the cell culture medium. After approximately one week ofincubation, the original stable HEK-293 cell line expressing the [N31GLC/wt HC] anti-HEL immunoglobulin and AID as well as the culture thathas been transiently transfected with additional AID expression vectorwere prepared for cell sorting.

2. Selection of Higher Affinity Mutants

The HEK-293 cell line expressing the [N31G LC/wt HC] anti-HELimmunoglobulin and AID activity and the culture that had beentransiently transfected with additional AID expression vector wereprepared for cell sorting by collecting the cells, washing with an equalvolume of PBS solution, pH 7.2 and resuspending 1e07 cells from eachculture in ice-cold PBS solution, pH 7.2 containing 1% (weight/volume)BSA and either 50 pM or 500 pM HEL-FITC at a final cell concentration of2e05 cells/mL.

Round 1

Hen Egg lysozyme (Sigma Aldrich, MO) was labeled with fluoresceiniosthiocyanate (FITC) using the EZ-Label™ FITC protein labeling kit(Pierce, Rockford, Ill.) following the manufacturers directions.

Following incubation for 30 minutes at 4° C., the cells were pelleted bycentrifugation and the volume was reduced to 200 μL. After transfer tosterile 3-mL tubes, a 1:500 dilution of PE-conjugated goat-anti-mouseimmunoglobulin was added to the cells and cells were incubated at 4° C.for 30 minutes. The cells were then pelleted by centrifugation andresuspended in 1 mL of sterile ice-cold PBS solution, pH 7.2 containing1% (weight/volume) BSA plus 2 nanograms/milliliter DAPI. LiveIgG-positive cells that were positive for FITC (excitation with a 150 mW488 nm laser, collection through a 528/38 filter) were isolated byfluorescence activated cell sorting (FACS) using a Cytopiea Influx CellSorter at a flow rate of approximately 10,000 events/second (FIG. 51).FACS windows were calibrated to ensure that higher affinity clones couldbe discriminated using this approach using HyHEL expressing cells.

The results show a small population of cells that, in all cases, isclearly separated from the main bulk of non-mutated cells. In cells thathave been newly transfected with the AID expression (panels B and D ofFIG. 51), this population of cells is consistently larger than in thepopulations of cells that did not receive additional AID expressionvector (panels A and C of FIG. 51). These cells were cultured asdescribed below.

Sorted cells were placed in 3 mL DMEM medium containing 10% FBS, 50μg/mL Geneticin, 10 μL/mL Antibiotic-Antimycotic solution, 1.5puromycin, 15 μg/mL blasticidin, and 350 μg/mL hygromycin (InvitrogenCorporation, Carlsbad, Calif.) in one well of a 6-well plate. The cellswere cultured until confluent and then archived and re-seeded in onewell of a 6-well plate at a cell density of 4×10⁵ cells/mL. The nextday, 100 μL OptiMEM (Invitrogen Corporation, Carlsbad, Calif.), 4 μLFugene6 (Roche Diagnostics Corporation, Indianapolis, Ind.), and 1 μg ofthe AID expression vector plasmid DNA were mixed and incubated forapproximately 25-30 minutes at room temperature. After incubation, thismixture was added drop-wise to the cell culture medium and the cellswere cultured and expanded for approximately 7 days. Samples of cellswere also taken for sequence analysis.

Round 2

Cells selected using FITC-HEL in the first round, as described above,were then subjected to the same selection conditions (i.e., incubationwith either 50 or 500 pM FITC-labeled HEL) in a second round of FACSsorting. Fifty milliliters (1e07 cells) of the cells selected from thefirst round were incubated in an ice-cold PBS solution, pH 7.2containing 1% (weight/volume) BSA and either approximately 50 pM or 500pM HEL-FITC for 30 minutes at 4° C. The cell mixture was pelleted, thevolume was reduced to 200 μL and the cells were transferred to sterile 3ml tubes. A 1:500 dilution of PE-conjugated goat-anti-mouse Ig was addedto the cells and the cells were incubated at 4° C. for 30 minutes. Thecells were then pelleted and resuspended 1 mL of an ice-cold PBSsolution, pH 7.2 containing 1% (weight/volume) BSA plus 2nanograms/milliliter DAPI. Live IgG-positive cells that were positivefor FITC (excitation with a 150 mW 488 nm laser, collection through a528/38 filter) were isolated by fluorescence activated cell sortingusing a Cytopiea Influx Cell Sorter at a flow rate of approximately10,000 events/second (FIG. 52).

The results of the second sort show a significantly larger population ofcells exhibiting high affinity HEL binding, consistent with theformation of higher affinity mutants by SHM during growth and culture.In cells that have been newly transfected with the AID expression vectorand then incubated with 500 pM HEL (panel D of FIG. 52), this is clearlya much larger population of highly fluorescent cells (25.9% of thepopulation versus 6.88% compared cells that did not receive additionalAID expression vector; panel C in FIG. 52). These results demonstratethat re-transformation with the AID expression vector is effective inpromoting a significant improvement in mutagenesis rate.

Continuing this process for 2 additional rounds of mutation withstringent gating on the selected cells (shown in FIG. 53, panel A)resulted in a profound and significant shift in the binding propertiesof the selected cells (FIG. 53, panel B).

3. Production of Secreted Immunoglobulins for Functional Analysis

Heavy and light chains of interest may be produced in a secreted formfor further functional analysis as described below. In the case of heavychains obtained from the surface displayed libraries, these areprocessed as described in Example 5 of priority U.S. Application Nos.60/904,622 and 61/020,124 (i.e., by digestion with XhoI, followed byre-ligation), to remove the transmembrane domain allowing for directsecretion of the antibody into the media.

Approximately one day prior to transfection, 3×10⁶ HEK-293 cells wereseeded in 10 mL DMEM/10% FBS medium in a T75 culture flask and incubatedovernight at 37° C. and 5% CO₂. On the day of transfection, 500 μLOptiMEM (Invitrogen Corporation, Carlsbad, Calif.), 20 μL HD-Fugene(Roche Diagnostics Corporation, Indianapolis, Ind.) and 1.5 μg of eachheavy and light chain expression vectors were mixed and incubated forapproximately 25-30 minutes at room temperature. After incubation, thismixture was added drop-wise to the cell culture medium.

Approximately three days post-transfection, the cell growth medium wasexchanged with 10 mL Freestyle medium (Invitrogen Corporation, Carlsbad,Calif.) and the cells were incubated for an additional 7 days. At theend of the incubation period, the cell culture supernatants wereharvested and filtered through a sterile 0.2 μm filter. The secretedimmunoglobulins were isolated via standard protein A affinity columnchromatography prior to BIACORE analysis as described below.

HEL is immobilized onto a research grade CM5 sensor chip using standardamine coupling. Each of three surfaces is first activated for sevenminutes using a 1:1 mixture of 0.1 mM N-hydroxysuccinimide (NHS) and 0.4mM 1-ethyl-3-(3-dimethylaminopropyl)-carbodimide (EDC). Then, the HELsample is diluted 1- to 50-fold in 10 mM sodium acetate, pH 4.0, andexposed to the activated chip surface for different lengths of time (tenseconds to two minutes) to create three different density surfaces ofHEL. Each surface is then blocked with a seven-minute injection of 1 Methanolamine, pH 8.2. Alternatively biotinylated HEL is diluted 100-foldand injected for different amounts of time to be captured at threedifferent surface densities (60 RU, 45 RU, 12 RU; Response Unit (RU) istermed by Biacore and relates to target molecule per surface area) ontoa streptavidin-containing sensor chip. All experiments are performed ona Biacore® 2000 or T100 optical biosensor. Anti-HEL antibodies aresupplied at 100 μg/mL and tested in a 3-fold dilution series in samplerunning buffer over HEL conjugated surfaces. Bound anti-HEL antibody isremoved using a five-second pulse with sensor regeneration solution. Alldata is collected at a temperature-controlled 20° C. The kineticresponses for the antibody injections are analyzed using the non-linearleast squares analysis program CLAMP (Myszka, D. G. and Morton, T. A.(1998) Trends Biochem. Sci., 23: 149-150).

4. Sequence Analysis

Sequences of the heavy and light chains isolated in the first sort weredetermined by PCR amplification of heavy and light chains as describedbelow.

At least 50,000 cells taken from populations of interest were pelletedat 1100×g for 5 min. at 4° C. Pelleted cells were resuspended in 15 μLdistilled H₂O and either used immediately in PCR reactions or werefrozen for later processing.

PCR reactions consisting of 27.6 μL H₂O, 5 μL 10× Pfx buffer, 1 μL cellsfrom above, 8 μL of 2.5 μM of each primer (listed below), and 0.4 μL Pfxpolymerase (Invitrogen Corp., Carlsbad, Calif.) for a total of 50 μLwere run using the following format: 1 cycle of 95° C.×2 min., followedby 35 cycles of 95° C.×30 sec, 55° C. for 30 sec, 68° C. for 45 sec,followed by 1 cycle of 68° C. for 1 min. PCR primers used to amplify theopen reading frames are:

Oligo 540: GTGGGAGGTCTATATAAGCAGAGC (SEQ ID NO: 362), which is a forwardprimer which maps at the 3′ end of a CMV promoter region, approximately140 nucleotides 5′ to the ATG start codon for both heavy and light chainopen reading frames.

Oligo 554: CAGAGGTGCTCTTGGAGGAGGGT (SEQ ID NO: 363), which is a heavychain-specific reverse primer which maps in the IgG gamma chain constantregion.

Oligo 552: ACACAACAGAGGCAGTTCCAGATT (SEQ ID NO: 364), which is a kappalight chain-specific reverse primer that maps near the amino end of thekappa constant region.

Oligo 577: AGTGTGGCCTTGTTGGCTTGAA (SEQ ID NO: 365), which is a lambdalight chain-specific reverse primer that maps to an N-proximal constantregion sequence shared by all five functional human lambda genes (IgL1,2, 3, 6, and 7).

To amplify the heavy chain, oligos 540+554 were used.

To amplify the light chains from a population of cells (in which therewas likelihood that a mixture of both kappa and lambda light chainswould be present), oligos 540, 552 and 577 were used simultaneously. Inthis case, the volume of water in the PCR reaction mix was adjusted to19.6 μL.

Following PCR, 5 μL of sample was removed for analysis on an agarosegel. Reactions for bands which were visualized on the gel and were thensubjected to further PCR in the presence of Taq polymerase (Invitrogen)using the following conditions: added directly to the remaining 45 μL ofPCR reaction were 2 μL H₂O, 0.5 μL Taq, 0.2 μL dNTPs at 2.5 mM each, and1.5 μL×50 mM MgCl₂ for a total of 50 μL (or alternatively, 1 μL of 10×Taq buffer was used in place of MgCl₂ while adjusting the H₂O tomaintain 50 μL final volume). PCR cycling was run as follows: 1 cycle of95° C.×2 min., followed by 2 cycles of 95° C.×30 sec, 55° C. for 30 sec,72° C. for 45 sec, followed by 1 cycle of 72° C. for 1 min.

Reactions for bands which were either not visualized on the gel or wereotherwise judged to be too weak to continue, were supplemented with 1 μLPfx buffer, 3.7 μL H₂O, and 0.3 μL Pfx polymerase and subjected to 1cycle of 95° C.×2 min., followed by 10 cycles of 95° C.×30 sec, 55° C.for 30 sec, 68° C. for 45 sec, followed by 1 cycle of 68° C. for 1 min.

PCR reactions for bands which were visible following analysis on anagarose gel were cloned using a TOPO® cloning kit from Invitrogenfollowing the manufacturer's suggested protocol. In brief, 4 μL PCRreaction was added to 1 μL salt solution (provided in the TOPO® kit)plus 1 μL TOPO® cloning vector. Following a 20 min. incubation at roomtemp., 1 or 2 μL were used to transform 100 μL XL1 blue as per themanufacturer's suggested protocol.

Reading frames from templates whose sequences were of further interestwere recovered as follows: heavy chain templates were recovered bydigesting the TOPO® clones with SgrAI and NheI, which are both presentin all of the original heavy chain sequences. The resultingapproximately 500 bp fragments (which contain the entire variable regionincluding all of CDR3), were cloned into the cognate sites of anexpression vector already comprising the heavy chain constant region togenerate an intact, contiguous heavy chain open reading frame. Oneversion of this vector also contains the transmembrane domain andcytoplasmic domain from the murine H2kk gene as an in-frame fusion withthe IgG1 constant region to permit retention of the final IgG moleculeon the cell surface, as described in Example 9. The alternative versionof the expression vector has the transmembrane deleted to allow fordirect secretion of the antibodies of interest.

Similarly, light chain templates of interest were removed from theirTOPO® cloning vectors using SbfI and MunI for kappa or SbfI and AclI forlambda, all of which sites are present in the original sequences. Theresulting 350-400 bp fragments (which contain the entire light chainvariable region including CDR3), were cloned into the cognate sites ofthe expression vector to generate an intact, contiguous light chain openreading frame.

The results demonstrated that in approximately 23% of the sequencedclones, there was at least one mutation within the CDR of the lightchain resulting in the mutation of Glycine 31 to Aspartate (G31D). Basedon the crystal structure of HyHEL 10 bound to HEL (Pons et al., (1999)Protein Science 8:958-68), this mutation would be predicted to result inthe formation of an additional hydrogen bond interaction during antigenbinding, which accounts for the increase in binding observed in thepresence of 500 pM HEL in FIG. 52 and Biacore measurements. The type ofmutations observed (FIGS. 54A and B) followed the predicted pattern ofmutations for SHM mediated mutation (as shown in FIG. 50), and did notresult in widespread non-specific mutation of the entire coding regionsof the heavy and light chains. These results, therefore, demonstrate theability of the system to provide good affinity discrimination, selectionof improved variants of the antibodies and binding proteins of thepresent invention, and the ability to provide for both sustained andpulsed hypermutation directed to specific regions of interest within oneor more target proteins. Furthermore, a handful of additional mutationswere identified that, when recombined into a single antibody construct,improved upon the affinity of the wild-type protein from 30 pM to betterthan 4 pM (FIG. 54C). This example demonstrates how a single sequence orlibrary under selective pressure and in the presence of SHM can quicklygenerate higher affinity mutants, and how this flow of mutational eventscan be predicted exactly by the computational algorithms outlined above.

The data presented herein demonstrate that the disclosed systems andseed polynucleotides for somatic hypermutation are capable of high leveltargeted mutagenesis of a target protein of interest. The system iscapable of iterative rounds of mutagenesis and selection enabling thedirected evolution of favorable mutations while reducing theaccumulation of neutral and harmful mutations, both within the proteinof interest and within the expression system.

5. Episomal Rescue

As episomal vectors remain unintegrated and easily separable from a hostcell's chromosomal material, plasmids can be recovered by the method ofHirt (Hirt, 1967; Kapoor and Frappier, 2005; Yates et al., 1984),transformed into competent bacteria and further manipulated to verifythe sequence, identity and/or properties of the encoded polypeptides.

Using an estimate of an average of 3 resident episomes of 8000 basepairs (bp) each per cell, one can expect a yield of approximately 30picogram (pg) per million cells (see, e.g., Formula I). Assuming atransformation efficiency into electrocompetent E. coli of 10⁷ coloniesper μg of relaxed circle DNA, one can expect approximately 300 E. colicolonies, each representing a single recovered episome, to result permillion mammalian cells.

(10⁶ cells×3 episomes/cell)×(660 g/mol/bp)×(8000 bp/episome)×(10⁶colonies/μg)×(10⁶ μg/g)÷(6×10²³ episomes/mol)=2.6×10⁻¹¹ g (DNA per 10⁶cells).  Formula 1

Plasmids can also be recovered using a standard alkaline lysisprocedure, e.g., as per a protocol from Qiagen, Inc. (for procedure, seee.g.,www1.qiagen.com/literature/handbooks/PDF/PlasmidDNAPurification/PLS_QP_Miniprep/1034641_HB_QIAprep_(—)112005.pdf;and Wade-Martins et al., Nuc Acids Res 27:1674-1682 (1999)). In oneaspect, transfected mammalian cells are treated the same way as the E.coli described in the Qiagen protocol. Episomes present in the finaleluate are transformed into competent E. coli as described above. Usingeither the Hirt supernatant or alkaline lysis method requires beginningwith a significant cell population for isolating resident episomes. Inone non-limiting example, starting with 50,000 clonally derived cells,one might expect to obtain 10 to 20 recovered episomes as manifested incolonies of transformed E. coli.

Additionally, expression of the SV40 T antigen provides for the rapidamplification of vectors containing an SV40 Ori, thus providing for amethod to amplify vector number prior to episome rescue. To achieve thisamplification, the SV40 T-antigen was cloned into an expression vector(as described herein) and was transiently transfected into 6.3×10e5 HEK293 cells that were stably harboring HyHEL10 HC and LC episomal vectors.Samples were taken at time 0 (immediately prior to transfection), and atday 1, 2 and 3. Cells were harvested by trypsinization and episomal DNAwas extracted using a Qiagen miniprep kit. Extracted DNA was transformedinto E. coli, which were grown on carbenicillin plates overnight, andcolonies were counted the next morning (Table 18).

TABLE 18 Number of colonies resulting from HyHEL10-expressing cellpopulation before and following transient transfection with T-antigen.day # cells # colonies 0 6.3 × 10⁵ 0 1 6.3 × 10⁵ 35 2 6.3 × 10⁵ 800-9003 6.3 × 10⁵ >5000

Another standard method to characterize transfected genes, whetherepisomal or integrated, involves performing a Polymerase Chain Reaction(PCR) reaction directly on the relevant cell population followed bycloning and characterizing individual resulting PCR fragments. Thismethod has the advantage of not requiring a large starting population ofcells. PCR amplification of the resident active antibody open readingframe can successfully be performed on as little as a single cell. Thishas the effect of foreshortening the time from isolation of a cell ofinterest to the point of sequencing the responsible open reading frame.

Another option is to perform RT-PCR on the isolated cells thusidentifying and characterizing the resident polypeptide(s) via expressedmRNA.

Example 10 Engineering Enhanced Mutants of AID

Activation induced cytidine deaminase (AID) is the primary enzymeresponsible for initiating somatic hypermutation (SHM), class switchrecombination (CSR) and gene conversion (GC) events during affinitymaturation by the immune system. The enzyme has been especially wellconserved during evolution, with the human, rat, cow, mouse and chickenorthologs exhibiting 94.4%, 93.9%, 93.9%, 92.4% and 89.4% identity tothe canine (dog) amino acid sequence, respectively.

AID contains several predicted protein-protein interaction domains,post-translational modification sites and subcellular targeting motifs,one of which is a nuclear export signal (NES) that is localized in thecarboxy terminal amino acids of the enzyme. The question as to whetheror not a nuclear localization signal (NLS) is present within AID remainscontroversial with some groups claiming such a signal exists (Ito etal., PNAS 2004 Feb. 17; 101(7):1975-80) while others maintain that nofunctional NLS is present (Brar et al., J. Biol. Chem. 2004 Jun. 8;279(25):26395-401; McBride et al., J. Exp. Med. 2004 May 3;199(9):1235-44).

Native AID is found primarily in the cytoplasmic compartment of cells,as demonstrated by cell fractionation, western blotting andimmunohistochemistry. Removal or disabling of the NES tends to permithigher steady-state resident concentrations of AID in the nucleus,higher levels of SHM, but also impaired or absent CSR (Brar et al, Id.;Durandy et al., Hum. Mutat. 2006 December; 27(12):1185-91; Ito et al,Id.; McBride et al, Id.).

Example 2 above describes the design and construction of an SHMresistant form of AID (SEQ ID NO. 341) comprising a mutation in the NES(L198A) designed to disable nuclear export thereby promoting nuclearretention. To further enhance nuclear localization and, thus, themutator activity of AID, further engineered versions of the enzyme werecreated by inserting the strong nuclear localization signal (NLS;PKKKRKV; SEQ ID NO: 340) derived from the SV40 T antigen (Kalderon etal, (1984). Cell 39, 499-509) near the amino terminus. To track AIDexpression, a FLAG epitope tag was also inserted to create (SEQ ID NO.342) which contains both a strong NLS and the mutant NES sequence.

Additional engineered versions of AID were also created by furthermodifying the C-terminal NES to reduce nuclear export. These constructswere prepared with and without the SV40 T antigen NLS.

In the first pair of NES mutants, polynucleotide sequences of SEQ ID NO:341 (without NLS) and SEQ ID NO: 342 (with NLS) were modified such thatamino acid residues L181, L183, L189, L196 and L198 encoded by thepolynucleotide sequences were mutated to Alanine resulting inpolynucleotide sequences of SEQ ID NO: 344 (without NLS) and SEQ ID NO:346 (with NLS), respectively, and amino acid sequences of SEQ ID NO: 345(without NLS) and SEQ ID NO: 347 (with NLS), respectively.

Muteins were generated by PCR, and then treated with Dpn1 to removeparental DNA.

To generate the alanine containing muteins, the following oligos wereused: CAGCTCAGGAGAATCCTCGCCCCCGCTTATGAGGTCGACGACCTC (SEQ ID NO: 352) andGAGGTCGTCGACCTCATAAGCGGGGGCGAGGATTCTCCTGAGCTG (SEQ ID NO: 353).

Two separate PCR reactions were set up using vectors containingpolynucleotide sequences set forth as SEQ ID NO: 341 or SEQ ID NO: 342as template DNA, using Pfu Taq polymerase (Invitrogen) with themanufacturers kit buffers and 2.5 uM of each deoxynucleotide (Roche).PCR was performed with the following cycle conditions: 1 cycle of 95° C.for 3 min, followed by 20 cycles of [95° C. for 45 sec, 55° C. for 45sec, 68° C. for 17 min], followed by 1 cycle of 68° C. for 5 min. Aftercompletion, 5 μl of the PCR reaction was run on a 1% agarose gel toconfirm a successful reaction. The PCR reaction mix was then treatedwith Dpn1 (New England Biolabs) for at least 4 hrs at 37° C. to removethe parental DNA.

Five (5) μL of the Dpn1-treated PCR reaction was added to 100 μL ofXL1-Blue super competent cells (Invitrogen) and transformed per themanufacturer's suggested protocol. Following sequence verification, theresulting DNA (which contained 2 of the 4 desired mutations; i.e., 181and 183), was used as a template with oligosCCGCTTATGAGGTCGACGACGCCAGAGATGCCTTCCGGACCG (SEQ ID NO: 354) andAGGGTCCGGAAGGCATCTCTGGCGTCGTCGACCTCATAAGCGG (SEQ ID NO: 355) in the sameprotocols listed above to introduce the third of four mutations (i.e.,189). Finally, oligos CCAGAGATGCCTTCCGGACCGCCGGGGCTTGATGTACAATC (SEQ IDNO: 356) and GATTGTACATCAAGCCCCGGCGGTCCGGAAGGCATCTCTGG (SEQ ID NO: 357)were used to incorporate the fourth and final mutation (i.e., 196).

The final set of alanine-containing mutein products were digested usingSac1 and BsrG1 and ligated into vector backbones cut with the cognaterestriction enzymes to generate SEQ. ID. NO. 344 (without NLS) and SEQ.ID. NO. 346 (with NLS), respectively.

In a second pair of muteins: polynucleotide sequences of SEQ. ID. NO.341 (without NLS) and SEQ. ID. NO. 342 (with NLS) were modified suchthat amino acid residues Asp187, Asp188 and Asp191 encoded by thepolynucleotide sequences were mutated to Glutamate and amino acidresidue Thr195 encoded by the polynucleotide sequences was mutated toIsoluecine, thereby creating polynucleotide sequences SEQ. ID. NO. 348(without NLS) and SEQ. ID. NO. 350 (with NLS), respectively, and aminoacid sequences of SEQ. ID. NO. 349 (without NLS) and SEQ. ID. NO. 351(with NLS), respectively.

The same set of procedures described above with respect to the alaninemuteins was repeated to generate the glutamate containing muteins of AIDSEQ. ID. NO. 348 and SEQ. ID. NO. 350, except that the following oligos:TCCTCCCCCTCTATGAGGTCGAAGAACTCAGAGAAGCCTTCCGGACCCTCGGGGC (SEQ ID NO: 358)and GCCCCGAGGGTCCGGAAGGCTTCTCTGAGTTCTTCGACCTCATAGAGGGGGAGGA (SEQ ID NO:359) were used in place of the first pair of oligos, and the followingoligos: AACTCAGAGAAGCCTTCCGGATCCTCGGGGCTTGATGTACAAT (SEQ ID NO: 360) andATTGTACATCAAGCCCCGAGGATCCGGAAGGCTTCTCTGAGTT (SEQ ID NO: 361) were usedin lieu of the second pair of oligos (no third PCR reaction was neededin this case). Products were treated as described above to generate SEQ.ID. NO. 348 (without NLS) and SEQ. ID. NO. 350 (with NLS).

Results and Discussion.

The six resulting AID constructs were subsequently tested for activityin a green fluorescent protein (GFP) reversion assay, and for frequencyof mutations on an immunoglobulin IgG heavy chain (HC) template.

To perform the GFP reversion assay, the TAC codon for tyrosine 82 wasaltered to a TAG stop codon (GFP*). GFP* was cloned into an Anaptysepisomal expression vector and stably transfected into HEK 293 (note:this cell line expresses EBNA1 from an integrated copy of the gene).Each AID construct in turn was transfected into the stably transfectedGFP* cell line, and cells were placed under selection (blasticidin forGFP* and hygromycin for each of the AID constructs) by day 2 posttransfection. Reversion of the stop codon back to tyrosine caused theepisome-harboring cell to fluoresce green. The frequency of GFPreversion was measured by fluorescence-activated cell sorter (FACS)analysis at 3, 6, and 10 days post selection.

TABLE 19 Functional competence of AID muteins as gauged by FACS analysisof GFP revertant cells gated on days 3, 6, and 10. Table 19 % gated %gated % gated Vector(s)/AID variants day 3 day 6 day 10 GFP* alone 0.04%0.02% 0.01% GFP* + expression of (SEQ ID. NO. 341) 0.44% 0.35% 0.39%GFP* + expression of (SEQ ID. NO. 342) 0.31% 0.37% 0.19% GFP* +expression of (SEQ ID. NO. 344) 0.19% 0.26% 0.21% GFP* + expression of(SEQ ID. NO. 346) 0.36% 0.35% 0.32% GFP* + expression of (SEQ ID. NO.348) 0.37% 0.30% 0.41% GFP* + expression of (SEQ ID. NO. 350) 0.18%0.26% 0.21%

The results indicate that co-transfection with each of the six AIDconstructs consistently yielded GFP revertants significantly abovebackground, indicating that all 6 muteins of AID are functional.

Because the GFP reversion assay requires both the initial activity ofAID and subsequent action by error prone polymerase in order to generatea positive, reverted cell, the results can provide a qualitative yes/nofor function. In order to determine actual reversion rates, a moreprecise template mutagenesis experiment was also conducted. Thus, inaddition to the GFP reversion assay, 2 of the AID constructs (SEQ ID.NO. 341; containing the L198A mutation in the NES) and SEQ ID. NO. 342,(containing the L198A NES mutation and the SV40 NLS)) were tested fortheir ability to induce mutations in the HC of HyHEL10 IgG (Pons et al,(1999) Protein Science 8:958-68; Smith-Gill et al. (1984) J. Immunology132:963). Episomal expression constructs (as described previously)encoding the HC of HyHEL10, an N31G mutein of the HyHEL10 light chain(LC), and either an expression vector containing SEQ ID NO: 341 or thesame vector backbone containing SEQ ID NO: 342, were co-transfected intoHEK 293 cells. Antibiotic selective pressure was added to thetransfected cell population (i.e., blasticidin, puromycin and hygromycinfor HC, LC and AID, respectively), and cells were harvested following 2months of culture. A total of 83 IgG HC templates were sequenced fromcells transfected with an expression vector comprising SEQ ID NO. 341,and 61 templates were sequences from cells transfected with anexpression vector comprising SEQ ID NO. 342. The percentage of mutationsper template vs. form of AID is shown in Table 20, below. The mutationfrequency calculated from the sequencing data is 1 mutation per 1438 bpgenerated by SEQ ID NO: 341, and 1 mutation per 1059 bp generated by SEQID NO: 342.

TABLE 20 Percentage of HyHEL10 IgG templates identified with mutationsobserved after co-expression of AID muteins SEQ ID NO. 341 or SEQ ID NO.342 Table 20 # Mutations per heavy chain template SEQ ID. No. 341 SEQID. No. 342 0 71%  72% 1 26%  20% 2 2.4%  6.8% 3 0 1.6% 4 0 1.6%

The results indicate that the version of AID that contains the NLS (SEQID. NO. 342) induced a greater number of mutations in the HyHEL10 HC IgGtemplate (1 per 1059 bp vs 1 per 1438 for the non-NLS containinghomolog), and similarly resulted in a greater number of templatescontaining multiple mutations (10% of templates by AID+NLS vs 2.4% forAID−NLS).

Sequences

Cold canine AID. Nuclear export signal was abrogated by altering theunmodified CTT (Leu198) codon to GCT (ala, shown underlined below).

(SEQ ID NO: 341)ATGGACTCTCTCCTCATGAAGCAGAGAAAGTTTCTCTACCACTTCAAGAACGTCAGATGGGCCAAGGGGAGACATGAGACCTATCTCTGTTACGTCGTCAAGAGGAGAGACTCAGCCACCTCTTTCTCCCTCGACTTTGGGCATCTCCGGAACAAGTCTGGGTGTCATGTCGAACTCCTCTTCCTCCGCTATATCTCAGACTGGGACCTCGACCCCGGGAGATGCTATAGAGTCACTTGGTTTACCTCTTGGTCCCCCTGTTATGACTGCGCCAGACATGTCGCCGACTTCCTCAGGGGGTATCCCAATCTCTCCCTCCGCATATTCGCCGCCCGACTCTATTTTTGTGAGGACAGGAAAGCCGAGCCCGAGGGGCTCAGGAGACTCCACCGGGCCGGGGTCCAGATCGCCATCATGACATTTAAGGACTATTTCTATTGTTGGAATACATTTGTCGAGAATCGGGAGAAGACTTTCAAAGCCTGGGAGGGGCTCCATGAGAACTCTGTCAGACTCTCTAGGCAGCTCAGGAGAATCCTCCTCCCCCTCTATGAGGTCGACGACCTCAGAGATGCCTTCCGGACCCTCGGGGCTTGA.

Features of the polynucleotide sequences (or amino acid sequences) arein 5′ to 3′ (or N- to C-terminal where appropriate) as follows:

SacI restriction site used for cloning, boxed letters; Kozak consensus,underlined; ATG start codon (bold capital letters); FLAG epitope tag(single underline); NLS (double-underline); cold canine AID; TGA stopcodon (bold capital letters); BsrGI and AscI restriction sites used forcloning (boxed letters). * indicates stop codon in protein sequence.

Flag-NLS-AID. (SEQ ID NO: 342)

ctaccacttcaagaacgtcagatgggccaaggggagacatgagacctatctctgttacgtcgtcaagaggagagactcagccacctattctccctcgactttgggcatctccggaacaagtctgggtgtcatgtcgaactcctcttcctccgctatatctcagactgggacctcgaccccgggagatgctatagagtcacttggtttacctcttggtccccctgttatgactgcgccagacatgtcgccgacttectcagggggtatcccaatctctccctccgcatattcgccgcccgactctatttttgtgaggacaggaaagccgagcccgaggggctcaggagactccaccgggccggggtccagatcgccatcatgacatttaaggactatttctattgttggaatacatttgtcgagaatcgggagaagactttcaaagcctgggaggggctccatgagaactctgtcagactctctaggcagctcaggagaatcctcctccccctctatgaggtcgacgacctcagagatgccttccggaccctcgg

(SEQ ID NO: 343)MDYKDDDDKGPKKKRKVDSLLMKQRKFLYHFTCNVRWAKGRHETYLCYVVTCRRDSATSFSLDFGHLRNKSGCHVELLFLRYTSDWDLDPGRCYRVTWFTSWSPCYDCARHVADFLRGYPNLSLRTFAARLYFCEDRKAEPEGLRRLHRAGVQTATMTFKDYFYCWNTFVENREKTFKAWEGLHENSVRLSRQLRRTLLPLYEVDDLRDAFRTLGA*.

The 4 underlined-and-capitalized GCC codons (ala) were changed from theoriginal sequence (CTC encoding Leu) by site directed mutagenesis.

(SEQ ID NO: 344)gagctcctaaccaccATGgactctctcctcatgaagcagagaaagtttctctaccacttcaagaacgtcagatgggccaaggggagacatgagacctatctctgttacgtcgtcaagaggagagactcagccacctctttctccctcgactttgggcatctccggaacaagtctgggtgtcatgtcgaactcctcttcctccgctatatctcagactgggacctcgaccccgggagatgctatagagtcacttggtttacctcttggtccccctgttatgactgcgccagacatgtcgccgacttcctcagggggtatcccaatctctccctccgcatattcgccgcccgactctatttttgtgaggacaggaaagccgagcccgaggggctcaggagactccaccgggccggggtccagatcgccatcatgacatttaaggactatttctattgttggaatacatttgtcgagaatcgggagaagactttcaaagcctgggaggggctccatgagaactctgtcagactctctaggcagctcaggagaatcctcGCCcccGCCtatgaggtcgacgacGCCagagatgccttccggaccGCCggggctTGAtgtaca.(SEQ ID NO: 345)MDSLLMKQRKFLYHFKNVRWAKGRHETYLCYVVKRRDSATSFSLDFGHLRNKSGCHVELLFLRYISDWDLDPGRCYRVTWFTSWSPCYDCARHVADFLRGYPNLSLRIFAARLYFCEDRKAEPEGLRRLHRAGVQIAIMTFKDYFYCWNTFVENREKTFKAWEGLHENSVRLSRQLRRILAPAYEVDDARDAFRTAGA*.

The 4 underlined-and-capitalized GCC codons (ala) were changed from theoriginal sequence (CTC encoding Leu) by site directed mutagenesis. Boxesand underlines are as described above.

(SEQ ID NO: 346)

ctaccacttcaagaacgtcagatgggccaaggggagacatgagacctatctctgttacgtcgtcaagaggagagactcagccacctctttctccctcgactttgggcatctccggaacaagtctgggtgtcatgtcgaactectcttectccgctatatctcagactgggacctcgaccccgggagatgctatagagtcacttggfttacctcttggtccccctgttatgactgcgccagacatgtcgccgacttcctcagggggtatcccaatctctccctccgcatattcgccgcccgactctatttttgtgaggacaggaaagccgagcccgaggggctcaggagactccaccgggccggggtccagatcgccatcatgacatttaaggactatttctattgaggaatacatttgtcgagaatcgggagaagactttcaaagcctgggaggggctccatgagaactctgtcagactctctaggcagctcaggagaatcctcGCCcccGCCtatgaggtcgacgacGCCagagatgccttccggaccGCCggggctTGAtgtaca.(SEQ ID NO: 347)MDYKDDDDKGPKKKRKVDSLLMKQRKFLYHFKNVRWAKGRHETYLCYVVKRRDSATSFSLDFGHLRNKSGCHVELLFLRYTSDWDLDPGRCYRVTWFTSWSPCYDCARHVADFLRGYPNLSLRTFAARLYFCEDRTCAEPEGLRRLHRAGVQTATNTTFTCDYFYCWNTFVENREKTFKAWEGLHENSVRLSRQLRRTLAPAYEVDDARDAFRTAGA*.

The 3 underlined-and-capitalized GAA codons (Glu) were changed from theoriginal sequence (Aspartate encoding codons). One additional mutation,T195I, (ACC to ATC) was also generated.

(SEQ ID NO: 348)gagctcctaaccaccATGgactctctcctcatgaagcagagaaagtttctctaccacttcaagaacgtcagatgggccaaggggagacatgagacctatctctgttacgtcgtcaagaggagagactcagccacctctttctccctcgactttgggcatctccggaacaagtctgggtgtcatgtcgaactcctcttcctccgctatatctcagactgggacctcgaccccgggagatgctatagagtcacttggtttacctcttggtccccctgttatgactgcgccagacatgtcgccgacttcctcagggggtatcccaatctctccctccgcatattcgccgcccgactctatttttgtgaggacaggaaagccgagcccgaggggctcaggagactccaccgggccggggtccagatcgccatcatgacatttaaggactatttctattgttggaatacatttgtcgagaatcgggagaagactttcaaagcctgggaggggctccatgagaactctgtcagactctctaggcagctcaggagaatcctcctccccctctatgaggtcGAAGAActcagaGAAgccttccggATCctcggggctTGAtgtaca.(SEQ ID NO: 349)MDSLLMKQRKFLYHFKNVRWAKGRHETYLCYVVKRRDSATSFSLDFGHLRNKSGCHVELLFLRYISDWDLDPGRCYRVTWFTSWSPCYDCARHVADFLRGYPNLSLRIFAARLYFCEDRKAEPEGLRRLHRAGVQIAIMTFKDYFYCWNTFVENREKTFKAWEGLHENSVRLSRQLRRILLPLYEVEELREAFRILGA*.

The 3 underlined-and-capitalized GAA codons (Glu) were changed from theoriginal sequence (Aspartate encoding codons). One additional mutation,T195I (ACC to ATC) was also generated. Boxes and underlines are asdescribed above.

(SEQ ID NO: 350)

ctaccacttcaagaacgtcagatgggccaaggggagacatgagacctatctctgttacgtcgtcaagaggagagactcagccacctctttctccctcgactttgggcatctccggaacaagtctgggtgtcatgtcgaactcctcttcctccgctatatctcagactgggacctcgaccccgggagatgctatagagtcacttggtttacctcttggtccccctgttatgactgcgccagacatgtcgccgacttcctcagggggtatcccaatctctccctccgcatattcgccgcccgactctatttttgtgaggacaggaaagccgagcccgaggggctcaggagactccaccgggccggggtccagatcgccatcatgacatttaaggactatttctattgttggaatacatttgtcgagaatcgggagaagactttcaaagcctgggaggggctccatgagaactctgtcagactctctaggcagctcaggagaatcctcctccccctctatgaggtcGAAGAActcagaGAAgccttccggATCctcggggctTGAtgtaca. (SEQ ID NO: 351)MDYKDDDDKGPKKKRKVDSLLMKQRKFLYHFKNVRWAKGRHETYLCYVVKRRDSATSFSLDFGHLRNKSGCHVELLFLRYTSDWDLDPGRCYRVTWFTSWSPCYDCARHVADFLRGYPNLSLRTFAARLYFCEDRTCAEPEGLRRUTRAGVQTATNTTFKDYFYCWNTFVENREKTFTCAWEGLHENSVRLSRQLRRTLLPLYEVEELREAFRILGA.

While preferred embodiments of the present invention have been shown anddescribed herein, such embodiments are provided by way of example only.It should be understood that various alternatives to the embodiments ofthe invention described herein can be employed in practicing theinvention. It is intended that the following claims define the scope ofthe invention and that methods and structures within the scope of theseclaims and their equivalents be covered thereby.

REFERENCES

-   1. Wang et al. Evolution of new non-antibody proteins via iterative    somatic hypermutation. Proc Natl Acad Sci USA. 2004 Nov. 30;    101(48):16745-16749.-   2. Yelamos, et al, Targeting of non-Ig sequences in place of V    segment by somatic hypermutation. Nature 1995; 376: 225-229.-   3. Zheng, et al., Intricate targeting of immunoglobulin somatic    hypermutation maximizes the efficiency of affinity maturation. J Exp    Med. 2005 May 2; 201(9):1467-1478.-   4. Ruckerl et al., Episomal vectors to monitor and induce somatic    hypermutation in human Burkitt-Lymphoma cell lines. Mol. Immunol.    2006 April; 43(10): 1645-1652.-   5. Bachl et al., Increased transcription levels induce higher    mutation rates in a hypermutating cell line. J. Immunol. 2001 Apr.    15; 166(8):5051-5057.-   6. Cumbers et al., Generation and iterative affinity maturation of    antibodies in vitro using hypermutating B-cell lines. Nat.    Biotechnol. 2002 November; 20(11): 1129-1134.-   7. Neuberger, et al. Somatic hypermutation at A.T pairs: polymerase    error versus dUTP incorporation. Nat Rev Immunol. 2005 February;    5(2): 171-178. Review.-   8. Wang, et al. Genome-wide somatic hypermutation. Proc Natl Acad    Sci USA. 2004 May 11; 101(19):7352-7356.-   9. Wang and Wabl. Hypermutation rate normalized by chronological    time. J. Immunol. 2005 May 1; 174(9):5650-5654.-   10. Martin et al. Somatic hypermutation of the AID transgene in B    and non-B cells. Proc Natl Acad Sci USA. 2002 Sep. 17; 99(19):    12304-12308.-   11. Shinkura R, et al. Separate domains of AID are required for    somatic hypermutation and class-switch recombination. Nat. Immunol.    2004 July; 5(7):707-712.-   12. Zhang (Scharff) et al., Clonal instability of V region    hypermutation in the Ramos Burkitt's lymphoma cell line. Int    Immunol. 2001 September; 13(9): 1175-1184.-   13. Ruckerl and Bachl. Activation induced cytidine deaminase fails    to induce a mutator phenotype in the human pre-B cell line Nalm6.    Eur. J. Immunol. 2005; 35: 290-298.-   14. Rogozin and Diaz. Cutting edge: DGYW/WRCH is a better predictor    of mutability at G:C bases in Ig hypermutation than the widely    accepted RGYW/WRCY motif and probably reflects a two-step    activation-induced cytidine deaminase-triggered process. J.    Immunol., 2004, 172: 3382-3384.-   15. Martin et al. Activation-induced cytidine deaminase turns on    somatic hypermutation in hybridomas. Nature. 2002 Feb. 14;    415(6873): 802-806.-   16. U.S. Pat. No. 6,815,194-   17. U.S. Pat. No. 5,885,827-   18. Coker et al., (2006) Genetic and In vitro assays of DNA    deamination Methods Enzymology 408 156-170-   20. Conticello et al., (2005) Evolution of the AID/APOBEC family of    polynucleotide (deoxy)cytidine deaminases. Mol. Biol. Evol. 22 (2)    367-377-   21. Odegard et al., (2006) Targeting of somatic hypermutation Nature    Rev. Imm. 6 573-583-   22. Shen et al. (2006) Somatic hypermutation and class switch    recombination in Msh6−/−Ung−/−double-knock out mice. J. Imm. 177    5386-5392-   23. Neuberger et al. (2005) Somatic hypermutation at A.T pairs:    polymerase error versus dUTP incorporation. Nat. Rev. Immunol. 5(2)    171-8-   24. Rogozin et al. (2004) Cutting Edge: DGYW/WRCH is a better    predictor of mutability at G:C bases in Ig hypermutation than the    widely accepted RGYW/WRCY motif and probably reflects a two step    activation induced cytidine deaminase triggered process. J. Imm. 172    3382-3384-   25. Wilson et al. (2005) MSH2-MSH6 stimulates DNA polymerase eta,    suggesting a role for A:T mutations in antibody genes. J. Exp. Med.    201 (4) 637-645-   26. Santa-Marta et al. (2006) HIV-1 vif protein blocks the cytidine    deaminase activity of B-cell specific AID in the E. coli by a    similar mechanism of action. Mol. Imm. 44 583-590-   27. Zan et al. (2005) The translesion DNA polymerase theta play a    dominant role in immunoglobulin gene somatic hypermutation. EMBO J.    24 3757-3769-   28. Watanebe et al. (2004) Rad18 guides pol eta to replication    stalling sites through physical interaction and PCNA    monoubiquitination. EMBO J. 23 3886-3896-   29. Besmer et al., (2006) The transcription elongation complex    directs activation induced cytidine deaminase mediated DNA    deamination. Mol. Cell. Biol. (2006) 26 (11) 4378-4385.-   30. Steele et al. (2006) Computational analyses show A to G    mutations correlate with nascent mRNA hairpins at somatic    hypermutation hotspots. DNA Repair doi:10.1016/j.dnarep.2006.06.002-   31. Odegard et al. (2005) Histone modifications associated with    somatic hypermutation. Immunity 23 101-110-   32. Komori et al. (2006) biased dA/dT somatic hypermutation as    regulated by the heavy chain intronic iEu enhancer and 3′ E alpha    enhancers in human lymphoblastoid B cells. Mol. Imm. 43 1817-1826-   33. Rada et al., (2001) The intrinsic hypermutability of antibody    heavy and light chain genes decays exponentially. EMBO J. 20    4570-4576-   34. Larijani et al. (2006) Mol. Cell. Biol.    Doi:10.1128/MCB.00824-06.-   35. Larijani et al., (2005) Methylation protects cytidines from    AID-mediated deamination. Mol. Immunol. 42(5) 599-604-   36. Poltoratsky et al., (2006) Down regulation of DNA polymerase    beta accompanies somatic hypermutation in human BL2 cell lines. DNA    Repair. 2006 doi:10.1016/j.dnarep.2006.10.003-   37. Hirt, (1967) Selective extraction of polyoma DNA from infected    mouse cell cultures. J. Mol. Biol. 26:365-369.-   38. Kapoor and Frappier, (2005) Methods for measuring the    replication and segregation of Epstein-Barr virus-based plasmids.    Methods Mol Biol. 292:247-66.-   39. Wade-Martins et al., (1999) Long-term stability of large insert    genomic DNA episomal shuttle vectors in human cells. Nuc Acids Res    27:1674-1682-   40. Qiagen, Inc. alkaline lysis procedure, see    www1.qiagen.com/literature/handbooks/PDF/PlasmidDNAPurification/PLS_QP_Miniprep/1034641_HB_QIAprep_(—)112005.pdf-   41. Yates et al., (1984) A cis-acting element from the Epstein-Barr    viral genome that permits stable replication of recombinant plasmids    in latently infected cells. PNAS 81; 3806-3810.-   42. Baker, (2005) The selectivity of beta-adrenoceptor antagonists    at the human beta1, beta2 and beta3 adrenoceptors. Br J. Pharmacol.    February; 144(3):317-22.-   43. Fitzgerald et al., (1998) Pharmacological and biochemical    characterization of a recombinant human galanin GALR1 receptor:    agonist character of chimeric galanin peptides. J Pharmacol Exp    Ther. 1998 November; 287(2):448-56.-   44. Ghosh et al., (2006) Design, synthesis, and progress toward    optimization of potent small molecule antagonists of CC chemokine    receptor 8 (CCR8). J Med. Chem. May 4; 49(9):2669-72.-   45. Gillian R. et al., (2004) Quantitative Assays of Chemotaxis and    Chemokinesis for Human Neural Cells. ASSAY and Drug Development    Technologies. 2(5): 465-472.-   46. Hintermann et al., (2005) Integrin Alpha6-Beta4-erbB2 Complex    Inhibits Haptotaxis by Up-regulating E-cadherin Cell-Cell Junctions    in Keratinocytes. J. Biol. Chem. 280(9): 8004-8015.-   47. Iwatsubo et al., (2003) J. Cardiovasc Pharmacol. January; 41    Suppl 1:S53-56.-   48. Gearhart and Wood, (2001) Emerging links between hypermutation    of antibody genes and DNA polymerases. Nature Rev. Immunol. 1:    187-192.-   49. Kawamura et al., (2004) DNA polymerase theta is preferentially    expressed in lymphoid tissues and upregulated in human cancers.    Int. J. Cancer 109(1):9-16.-   50. Zan et al., (2005) The translesion DNA polymerase theta plays a    dominant role in immunoglobulin gene somatic hypermutation. EMBO    Journal 24, 3757-3769.-   51. Zeng et al., (2001) DNA polymerase eta is an A-T mutator in    somatic hypermutation of immunoglobulin variable genes. Nat.    Immunol. 2(6):537-41.-   52. Habel et al. (2004) Maintenance of Epstein-Barr virus-derived    episomal vectors in the murine Sp2/0 myeloma cell line is dependent    upon exogenous expression of human EBP2. Biochem Cell Biol.    82(3):375-80.-   53. Kapoor et al. (2001) Reconstitution of Epstein-Barr virus-based    plasmid partitioning in budding yeast. EMBO J. 20(1-2):222-30.

1.-31. (canceled)
 32. A method for preparing a gene product having adesired property, comprising: a) expressing in a population of cells atleast one somatic hypermutation (SHM) resistant synthetic gene encodinga gene product, wherein one or more first SHM motifs in an unmodifiedpolynucleotide sequence encoding an unmodified gene product has beenreplaced by one or more second SHM motifs having a lower probability ofSHM, the synthetic gene having a greater density of cold spots and/or alower density of hot spots than the unmodified polynucleotide sequence,and wherein the population of cells express AID activity, or can beinduced to express AID activity via the addition of an inducing agent;and b) selecting a cell or cells within the population of cells whichexpress a modified gene product having the desired property.
 33. Themethod of claim 32, optionally further comprising activating or inducingthe expression of AID activity in the population of cells.
 34. Themethod of claim 32, further comprising establishing one or more clonalpopulations of cells from the cell or cells identified in (b).
 35. Themethod of claim 32, wherein the at least one SHM resistant syntheticgene encodes an enzyme involved in SHM, a synthetic selectable markergene, or a synthetic reporter gene.
 36. The method of claim 32, whereinthe population of cells further comprises a gene product encoded by aSHM susceptible synthetic gene, wherein one or more first SHM motifs inan unmodified polynucleotide sequence encoding an unmodified geneproduct has been replaced by one or more second SHM motifs having ahigher probability of SHM, the synthetic gene having a greater densityof hot spot motifs and/or a lower density of cold spots than theunmodified polynucleotide sequence.
 37. A method for preparing a geneproduct having a desired property, comprising: a) preparing a somatichypermutation (SHM) susceptible synthetic gene encoding a gene product,wherein one or more first SHM motifs in an unmodified polynucleotidesequence encoding an unmodified gene product has been replaced by one ormore second SHM motifs having a higher probability of SHM, the syntheticgene having a greater density of hot spot motifs and/or a lower densityof cold spots than the unmodified polynucleotide sequence, b) expressingthe SHM susceptible synthetic gene in a population of cells; wherein thepopulation of cells expresses AID activity, or can be induced to expressAID activity via the addition of an inducing agent; and c) selecting acell or cells within the population of cells which express a modifiedgene product having the desired property.
 38. The method of claim 37,optionally further comprising activating or inducing the expression ofAID activity in the population of cells.
 39. The method of claim 37,further comprising establishing one or more clonal populations of cellsfrom the cell or cells identified in (c).
 40. The method of claim 37,wherein the SHM susceptible synthetic gene encodes an antibody orantigen-binding fragment thereof, a neurotransmitter, a hormone, acytokine, a chemokine, an enzyme, a receptor, a toxin, a co-factor, or atranscription factor.
 41. The method of claim 37, wherein the populationof cells further comprises a gene product encoded by a SHM resistantsynthetic gene wherein one or more first SHM motifs in an unmodifiedpolynucleotide sequence encoding an unmodified gene product has beenreplaced by one or more second SHM motifs having a lower probability ofSHM, the synthetic gene having a greater density of cold spots and/or alower density of hot spots than the unmodified polynucleotide sequence.